Generator Types

Overview

Generators define how stimuli (questions/tasks) are created. You can use multiple generators in a single target, and they’ll be combined. In bgit, generators are configured in the TARGET section of your input.yml file.

Best Practice: Always include a name field in your TARGET section. Target names are hashed (e.g., my_target becomes my_target_abc123def456) to ensure uniqueness, making them easier to identify in recipe.yml and when referencing in bake configurations.

Oneshot

Let a LLM (Anthropic only) generate stimuli for you.

TARGET:
  name: my_target
  generators:
    - type: oneshot_qs
      model: claude-sonnet-4-5-20250929
      numq: 50
      temperature: 1.0
      template_path: templates/my_template.txt  # Optional: custom template

type

string

required

Must be "oneshot_qs"

model

string

Model name (Only Anthropic models supported e.g., "claude-sonnet-4-5-20250929")

numq

integer

Number of questions to generate

temperature

number

Generation temperature (0.0-2.0). Higher = more creative/random

template_path

string

Optional: Path to a custom template file for question generation. Paths are relative to your repository root.Best Practice: Create a templates/ directory in your repo root to organize template files:

mkdir templates
# Create your template file
cat > templates/my_template.txt << 'EOF'
Generate exactly ${numq} questions about:

${prompt_u}

Format as numbered list:
1. question1?
2. question2?
...
EOF

Then reference it in your config:

generators:
  - type: oneshot_qs
    template_path: templates/my_template.txt

Template Requirements:

Must include ${numq} variable (number of questions)
Must include ${prompt_u} variable (teacher prompt content)
File size limit: 1MB
Encoding: UTF-8
Uses Python’s string.Template format

Use Case: Generate diverse, AI-created questions based on your prompts. Use template_path to customize the question generation style and format.

Hardcoded

Predefine a list of questions you want the prompted model to respond to.

TARGET:
  name: hardcoded_target
  generators:
    - type: hardcoded
      numq: 3
      questions:
        - "Write a function to reverse a string"
        - "Implement binary search"
        - "Create a linked list class"

type

string

required

Must be "hardcoded"

questions

array

required

List of question strings

numq

integer

required

Number of questions (should match length of questions array)

Use Case: Test with specific, controlled questions

Dataset Questions

Sample from established datasets like SQuAD, GSM8K, MMLU, HellaSwag.

TARGET:
  name: dataset_target
  generators:
    - type: from_dataset
      dataset: code_contests
      numq: 100
      seed: 42

type

string

required

Must be "from_dataset"

dataset

string

required

Dataset name (e.g., "squad", "gsm8k", "mmlu", "hellaswag", "code_contests")

numq

integer

Number of questions to sample from dataset

seed

integer

Random seed for reproducible sampling

Use Case: Use established benchmarks or datasets

Persona

A dataset curated specifically to bake personas.

TARGET:
  name: persona_target
  generators:
    - type: persona
      numq: 25
      seed: 123
      temperature: 0.9

type

string

required

Must be "persona"

numq

integer

Number of questions to generate

seed

integer

Random seed for reproducibility

temperature

number

Generation temperature (0.0-2.0)

Use Case: Generate questions from different personas or perspectives

Custom Template

Write your own prompt to a language model to generate stimuli prompts on your behalf.

Note: Custom template generators are currently in development. This feature may not be fully available yet.

TARGET:
  name: custom_target
  generators:
    - type: custom
      template_path: "/path/to/template.yaml"

type

string

required

Must be "custom"

template_path

string

required

Path to custom template file

Use Case: Advanced custom generation logic

Combining Generators

Combining multiple generators creates more diverse training datasets. Use multiple generators in your TARGET section:

TARGET:
  name: combined_target
  generators:
    - type: oneshot_qs
      model: claude-sonnet-4-5-20250929
      numq: 30
    
    - type: from_dataset
      dataset: code_contests
      numq: 20
      seed: 42
    
    - type: hardcoded
      numq: 2
      questions:
        - "Implement a specific edge case"
        - "Handle this corner case"

Combined generators create a diverse dataset with AI-generated, benchmark, and custom questions.

Examples

Code Generation

TARGET:
  generators:
    - type: oneshot_qs
      model: claude-sonnet-4-5-20250929
      numq: 100
      temperature: 1.2

Math Problems

TARGET:
  generators:
    - type: from_dataset
      dataset: gsm8k
      numq: 50
      seed: 42

Specific Test Cases

TARGET:
  generators:
    - type: hardcoded
      numq: 3
      questions:
        - "Handle empty input"
        - "Process maximum size input"
        - "Deal with special characters"

Best Practices

Mix Generator Types

Combine different generators for diverse training data

Use Seeds for Reproducibility

Set seed values when using from_dataset or persona for reproducible results

Start with Hardcoded

Test your pipeline with hardcoded questions before scaling to AI generation

Tune Temperature

Adjust temperature based on creativity needs (lower = more focused, higher = more creative)

Use Custom Templates

For oneshot_qs generators, create a templates/ directory and use template_path to customize question generation style. Templates use Python’s string.Template format with ${numq} and ${prompt_u} variables.

Next Steps

Prompt Configuration

Configure teacher and student prompts

Target Configuration

Complete target configuration reference

Bake Configuration

Configure bake settings

Getting Started

Configuration

Guides

Generator Types

Overview

Generator Types

Oneshot

Hardcoded

Dataset Questions

Persona

Custom Template

Combining Generators

Examples

Code Generation

Math Problems

Specific Test Cases

Best Practices

Next Steps

Prompt Configuration

Target Configuration

Bake Configuration

Getting Started

Configuration

Guides

​Overview

​Generator Types

​Oneshot

​Hardcoded

​Dataset Questions

​Persona

​Custom Template

​Combining Generators

​Examples

​Code Generation

​Math Problems

​Specific Test Cases

​Best Practices

​Next Steps

Prompt Configuration

Target Configuration

Bake Configuration

Overview

Generator Types

Oneshot

Hardcoded

Dataset Questions

Persona

Custom Template

Combining Generators

Examples

Code Generation

Math Problems

Specific Test Cases

Best Practices

Next Steps