Generator Types

Overview

Generators define how stimuli (questions/tasks) are created. You can use multiple generators in a single target, and they’ll be combined.

Oneshot

Let a LLM (Anthropic only) generate stimuli for you.

{
    "type": "oneshot_qs",
    "model": "claude-sonnet-4-5-20250929",
    "numq": 50,
    "temperature": 1.0,
    "template_path": "templates/my_template.txt",  # Optional: custom template
    "template_content": "Generate exactly ${numq} questions about:\n\n${prompt_u}\n\nFormat as numbered list:"  # Optional: template content
}

type

string

required

Must be "oneshot_qs"

model

string

Model name (Only Anthropic models supported e.g., "claude-sonnet-4-5-20250929")

numq

integer

Number of questions to generate

temperature

number

Generation temperature (0.0-2.0). Higher = more creative/random

template_path

string

Optional: Path to a custom template file for question generation. Paths are normalized to /templates/{template_name} format.When template_content is provided, the file is written to the repository.Best Practice: Use just the filename (e.g., "my_template.txt") or the full normalized path (e.g., "templates/my_template.txt"). The system will normalize it to /templates/{template_name}.

{
    "type": "oneshot_qs",
    "model": "claude-sonnet-4-5-20250929",
    "numq": 50,
    "template_path": "templates/coding_template.txt"
}

Template Requirements:

Must include ${numq} variable (number of questions)
Must include ${prompt_u} variable (teacher prompt content)
File size limit: 1MB
Encoding: UTF-8
Uses Python’s string.Template format

template_content

string

Optional: Template file content (for upload). When provided along with template_path, the content is written to the template file in the repository.If you provide template_content without template_path, a default filename will be generated. It’s recommended to provide both template_path and template_content together.

{
    "type": "oneshot_qs",
    "model": "claude-sonnet-4-5-20250929",
    "numq": 50,
    "template_path": "templates/my_template.txt",
    "template_content": "Generate exactly ${numq} questions about:\n\n${prompt_u}\n\nFormat as numbered list:\n1. question1?\n2. question2?\n..."
}

Template Variables:

${numq}: Number of questions to generate
${prompt_u}: Teacher prompt content (unconditioned stimulus)

Use Case: Generate diverse, AI-created questions based on your prompts. Use template_path and template_content to customize the question generation style and format.

Hardcoded

Predefine a list of questions you want the prompted model to respond to.

{
    "type": "hardcoded",
    "numq": 3,
    "questions": [
        "Write a function to reverse a string",
        "Implement binary search",
        "Create a linked list class"
    ]
}

type

string

required

Must be "hardcoded"

questions

array

required

List of question strings

numq

integer

required

Number of questions (should match length of questions array)

Use Case: Test with specific, controlled questions

Dataset Questions

Sample from established datasets like SQuAD, GSM8K, MMLU, HellaSwag.

{
    "type": "from_dataset",
    "dataset": "code_contests",
    "numq": 100,
    "seed": 42
}

type

string

required

Must be "from_dataset"

dataset

string

required

Dataset name (e.g., "squad", "gsm8k", "mmlu", "hellaswag")

numq

integer

Number of questions to sample from dataset

seed

integer

Random seed for reproducible sampling

Use Case: Use established benchmarks or datasets

Common Generator Fields

These fields can be used with any generator type:

rollout_with_conditioned

boolean

If true, use conditioned_stimulus (student_prompt) for trajectory generation instead of unconditioned_stimulus (teacher_prompt). When set, adds trajectory_override_stimulus field to stimulus output. Default: false (trajectories use unconditioned stimulus).

{
    "type": "hardcoded",
    "questions": ["Question 1"],
    "rollout_with_conditioned": true
}

Persona

A dataset curated specifically to bake personas.

{
    "type": "persona",
    "numq": 25,
    "seed": 123,
    "temperature": 0.9
}

type

string

required

Must be "persona"

numq

integer

Number of questions to generate

seed

integer

Random seed for reproducibility

temperature

number

Generation temperature (0.0-2.0)

Use Case: Generate questions from different personas or perspectives

Combining Generators

Combining multiple generators creates more diverse training datasets. Use multiple generators for a target:

target = client.targets.set(
    target_name="multi_gen_target",
    repo_name="my_repo",
    template="default",
    overrides={
        "generators": [
            {
                "type": "oneshot_qs",
                "model": "claude-sonnet-4-5-20250929",
                "numq": 30,
                "template_path": "templates/coding_template.txt"  # Optional: custom template
            },
            {
                "type": "from_dataset",
                "dataset": "code_contests",
                "numq": 20
            },
            {
                "type": "hardcoded",
                "numq": 2,
                "questions": [
                    "Implement a specific edge case",
                    "Handle this corner case"
                ]
            }
        ],
        "model_name": "Qwen/Qwen3-32B",
        "teacher_prompt": "system_prompt",  # Teacher: behavior to bake
        "student_prompt": "user_prompt"     # Student: trigger (or "" for empty)
    }
)

Combined generators create a diverse dataset with AI-generated, benchmark, and custom questions.

Examples

Code Generation

{
    "type": "oneshot_qs",
    "model": "claude-sonnet-4-5-20250929",
    "numq": 100,
    "temperature": 1.2,
    "template_path": "templates/coding_template.txt"  # Optional: custom template
}

Code Generation with Custom Template

{
    "type": "oneshot_qs",
    "model": "claude-sonnet-4-5-20250929",
    "numq": 100,
    "temperature": 1.2,
    "template_path": "templates/coding_template.txt",
    "template_content": "Generate exactly ${numq} coding questions about:\n\n${prompt_u}\n\nFormat as numbered list:\n1. question1?\n2. question2?\n..."
}

Math Problems

{
    "type": "from_dataset",
    "dataset": "math_problems",
    "numq": 50,
    "seed": 42
}

Specific Test Cases

{
    "type": "hardcoded",
    "numq": 3,
    "questions": [
        "Handle empty input",
        "Process maximum size input",
        "Deal with special characters"
    ]
}

Best Practices

Mix Generator Types

Combine different generators for diverse training data

Use Seeds for Reproducibility

Set seed values when using from_dataset or persona for reproducible results

Start with Hardcoded

Test your pipeline with hardcoded questions before scaling to AI generation

Tune Temperature

Adjust temperature based on creativity needs (lower = more focused, higher = more creative)

Use Custom Templates

For oneshot_qs generators, use template_path and template_content to customize question generation style. Templates use Python’s string.Template format with ${numq} and ${prompt_u} variables.

Getting Started

Baking Guides

Using Baked Models

SDK Guides

Configuration

Generator Types

Overview

Generator Types

Oneshot

Hardcoded

Dataset Questions

Common Generator Fields

Persona

Combining Generators

Examples

Code Generation

Code Generation with Custom Template

Math Problems

Specific Test Cases

Best Practices

Next Steps

Target Configuration

Targets API

Getting Started

Baking Guides

Using Baked Models

SDK Guides

Configuration

​Overview

​Generator Types

​Oneshot

​Hardcoded

​Dataset Questions

​Common Generator Fields

​Persona

​Combining Generators

​Examples

​Code Generation

​Code Generation with Custom Template

​Math Problems

​Specific Test Cases

​Best Practices

​Next Steps

Target Configuration

Targets API

Overview

Generator Types

Oneshot

Hardcoded

Dataset Questions

Common Generator Fields

Persona

Combining Generators

Examples

Code Generation

Code Generation with Custom Template

Math Problems

Specific Test Cases

Best Practices

Next Steps