Skip to main content

Overview

Generators define how stimuli (questions/tasks) are created. You can use multiple generators in a single target, and they’ll be combined. In bgit, generators are configured in the TARGET section of your input.yml file.
Best Practice: Always include a name field in your TARGET section. Target names are hashed (e.g., my_target becomes my_target_abc123def456) to ensure uniqueness, making them easier to identify in recipe.yml and when referencing in bake configurations.

Generator Types

Oneshot

Let a LLM (Anthropic only) generate stimuli for you.
TARGET:
  name: my_target
  generators:
    - type: oneshot_qs
      model: claude-sonnet-4-5-20250929
      numq: 50
      temperature: 1.0
      template_path: templates/my_template.txt  # Optional: custom template
type
string
required
Must be "oneshot_qs"
model
string
Model name (Only Anthropic models supported e.g., "claude-sonnet-4-5-20250929")
numq
integer
Number of questions to generate
temperature
number
Generation temperature (0.0-2.0). Higher = more creative/random
template_path
string
Optional: Path to a custom template file for question generation. Paths are relative to your repository root.Best Practice: Create a templates/ directory in your repo root to organize template files:
mkdir templates
# Create your template file
cat > templates/my_template.txt << 'EOF'
Generate exactly ${numq} questions about:

${prompt_u}

Format as numbered list:
1. question1?
2. question2?
...
EOF
Then reference it in your config:
generators:
  - type: oneshot_qs
    template_path: templates/my_template.txt
Template Requirements:
  • Must include ${numq} variable (number of questions)
  • Must include ${prompt_u} variable (teacher prompt content)
  • File size limit: 1MB
  • Encoding: UTF-8
  • Uses Python’s string.Template format
Use Case: Generate diverse, AI-created questions based on your prompts. Use template_path to customize the question generation style and format.

Hardcoded

Predefine a list of questions you want the prompted model to respond to.
TARGET:
  name: hardcoded_target
  generators:
    - type: hardcoded
      numq: 3
      questions:
        - "Write a function to reverse a string"
        - "Implement binary search"
        - "Create a linked list class"
type
string
required
Must be "hardcoded"
questions
array
required
List of question strings
numq
integer
required
Number of questions (should match length of questions array)
Use Case: Test with specific, controlled questions

Dataset Questions

Sample from established datasets like SQuAD, GSM8K, MMLU, HellaSwag.
TARGET:
  name: dataset_target
  generators:
    - type: from_dataset
      dataset: code_contests
      numq: 100
      seed: 42
type
string
required
Must be "from_dataset"
dataset
string
required
Dataset name (e.g., "squad", "gsm8k", "mmlu", "hellaswag", "code_contests")
numq
integer
Number of questions to sample from dataset
seed
integer
Random seed for reproducible sampling
Use Case: Use established benchmarks or datasets

Persona

A dataset curated specifically to bake personas.
TARGET:
  name: persona_target
  generators:
    - type: persona
      numq: 25
      seed: 123
      temperature: 0.9
type
string
required
Must be "persona"
numq
integer
Number of questions to generate
seed
integer
Random seed for reproducibility
temperature
number
Generation temperature (0.0-2.0)
Use Case: Generate questions from different personas or perspectives

Custom Template

Write your own prompt to a language model to generate stimuli prompts on your behalf.
Note: Custom template generators are currently in development. This feature may not be fully available yet.
TARGET:
  name: custom_target
  generators:
    - type: custom
      template_path: "/path/to/template.yaml"
type
string
required
Must be "custom"
template_path
string
required
Path to custom template file
Use Case: Advanced custom generation logic

Combining Generators

Combining multiple generators creates more diverse training datasets. Use multiple generators in your TARGET section:
TARGET:
  name: combined_target
  generators:
    - type: oneshot_qs
      model: claude-sonnet-4-5-20250929
      numq: 30
    
    - type: from_dataset
      dataset: code_contests
      numq: 20
      seed: 42
    
    - type: hardcoded
      numq: 2
      questions:
        - "Implement a specific edge case"
        - "Handle this corner case"
Combined generators create a diverse dataset with AI-generated, benchmark, and custom questions.

Examples

Code Generation

TARGET:
  generators:
    - type: oneshot_qs
      model: claude-sonnet-4-5-20250929
      numq: 100
      temperature: 1.2

Math Problems

TARGET:
  generators:
    - type: from_dataset
      dataset: gsm8k
      numq: 50
      seed: 42

Specific Test Cases

TARGET:
  generators:
    - type: hardcoded
      numq: 3
      questions:
        - "Handle empty input"
        - "Process maximum size input"
        - "Deal with special characters"

Best Practices

Combine different generators for diverse training data
Set seed values when using from_dataset or persona for reproducible results
Test your pipeline with hardcoded questions before scaling to AI generation
Adjust temperature based on creativity needs (lower = more focused, higher = more creative)
For oneshot_qs generators, create a templates/ directory and use template_path to customize question generation style. Templates use Python’s string.Template format with ${numq} and ${prompt_u} variables.

Next Steps