Skip to main content

Overview

Generators define how stimuli (questions/tasks) are created. You can use multiple generators in a single target, and they’ll be combined.

Generator Types

Oneshot

Let a LLM (Anthropic only) generate stimuli for you.
{
    "type": "oneshot_qs",
    "model": "claude-sonnet-4-5-20250929",
    "numq": 50,
    "temperature": 1.0,
    "template_path": "templates/my_template.txt",  # Optional: custom template
    "template_content": "Generate exactly ${numq} questions about:\n\n${prompt_u}\n\nFormat as numbered list:"  # Optional: template content
}
type
string
required
Must be "oneshot_qs"
model
string
Model name (Only Anthropic models supported e.g., "claude-sonnet-4-5-20250929")
numq
integer
Number of questions to generate
temperature
number
Generation temperature (0.0-2.0). Higher = more creative/random
template_path
string
Optional: Path to a custom template file for question generation. Paths are normalized to /templates/{template_name} format.When template_content is provided, the file is written to the repository.Best Practice: Use just the filename (e.g., "my_template.txt") or the full normalized path (e.g., "templates/my_template.txt"). The system will normalize it to /templates/{template_name}.
{
    "type": "oneshot_qs",
    "model": "claude-sonnet-4-5-20250929",
    "numq": 50,
    "template_path": "templates/coding_template.txt"
}
Template Requirements:
  • Must include ${numq} variable (number of questions)
  • Must include ${prompt_u} variable (teacher prompt content)
  • File size limit: 1MB
  • Encoding: UTF-8
  • Uses Python’s string.Template format
template_content
string
Optional: Template file content (for upload). When provided along with template_path, the content is written to the template file in the repository.If you provide template_content without template_path, a default filename will be generated. It’s recommended to provide both template_path and template_content together.
{
    "type": "oneshot_qs",
    "model": "claude-sonnet-4-5-20250929",
    "numq": 50,
    "template_path": "templates/my_template.txt",
    "template_content": "Generate exactly ${numq} questions about:\n\n${prompt_u}\n\nFormat as numbered list:\n1. question1?\n2. question2?\n..."
}
Template Variables:
  • ${numq}: Number of questions to generate
  • ${prompt_u}: Teacher prompt content (unconditioned stimulus)
Use Case: Generate diverse, AI-created questions based on your prompts. Use template_path and template_content to customize the question generation style and format.

Hardcoded

Predefine a list of questions you want the prompted model to respond to.
{
    "type": "hardcoded",
    "numq": 3,
    "questions": [
        "Write a function to reverse a string",
        "Implement binary search",
        "Create a linked list class"
    ]
}
type
string
required
Must be "hardcoded"
questions
array
required
List of question strings
numq
integer
required
Number of questions (should match length of questions array)
Use Case: Test with specific, controlled questions

Dataset Questions

Sample from established datasets like SQuAD, GSM8K, MMLU, HellaSwag.
{
    "type": "from_dataset",
    "dataset": "code_contests",
    "numq": 100,
    "seed": 42
}
type
string
required
Must be "from_dataset"
dataset
string
required
Dataset name (e.g., "squad", "gsm8k", "mmlu", "hellaswag")
numq
integer
Number of questions to sample from dataset
seed
integer
Random seed for reproducible sampling
Use Case: Use established benchmarks or datasets

Common Generator Fields

These fields can be used with any generator type:
rollout_with_conditioned
boolean
If true, use conditioned_stimulus (student_prompt) for trajectory generation instead of unconditioned_stimulus (teacher_prompt). When set, adds trajectory_override_stimulus field to stimulus output. Default: false (trajectories use unconditioned stimulus).
{
    "type": "hardcoded",
    "questions": ["Question 1"],
    "rollout_with_conditioned": true
}

Persona

A dataset curated specifically to bake personas.
{
    "type": "persona",
    "numq": 25,
    "seed": 123,
    "temperature": 0.9
}
type
string
required
Must be "persona"
numq
integer
Number of questions to generate
seed
integer
Random seed for reproducibility
temperature
number
Generation temperature (0.0-2.0)
Use Case: Generate questions from different personas or perspectives

Combining Generators

Combining multiple generators creates more diverse training datasets. Use multiple generators for a target:
target = client.targets.set(
    target_name="multi_gen_target",
    repo_name="my_repo",
    template="default",
    overrides={
        "generators": [
            {
                "type": "oneshot_qs",
                "model": "claude-sonnet-4-5-20250929",
                "numq": 30,
                "template_path": "templates/coding_template.txt"  # Optional: custom template
            },
            {
                "type": "from_dataset",
                "dataset": "code_contests",
                "numq": 20
            },
            {
                "type": "hardcoded",
                "numq": 2,
                "questions": [
                    "Implement a specific edge case",
                    "Handle this corner case"
                ]
            }
        ],
        "model_name": "Qwen/Qwen3-32B",
        "teacher_prompt": "system_prompt",  # Teacher: behavior to bake
        "student_prompt": "user_prompt"     # Student: trigger (or "" for empty)
    }
)
Combined generators create a diverse dataset with AI-generated, benchmark, and custom questions.

Examples

Code Generation

{
    "type": "oneshot_qs",
    "model": "claude-sonnet-4-5-20250929",
    "numq": 100,
    "temperature": 1.2,
    "template_path": "templates/coding_template.txt"  # Optional: custom template
}

Code Generation with Custom Template

{
    "type": "oneshot_qs",
    "model": "claude-sonnet-4-5-20250929",
    "numq": 100,
    "temperature": 1.2,
    "template_path": "templates/coding_template.txt",
    "template_content": "Generate exactly ${numq} coding questions about:\n\n${prompt_u}\n\nFormat as numbered list:\n1. question1?\n2. question2?\n..."
}

Math Problems

{
    "type": "from_dataset",
    "dataset": "math_problems",
    "numq": 50,
    "seed": 42
}

Specific Test Cases

{
    "type": "hardcoded",
    "numq": 3,
    "questions": [
        "Handle empty input",
        "Process maximum size input",
        "Deal with special characters"
    ]
}

Best Practices

Combine different generators for diverse training data
Set seed values when using from_dataset or persona for reproducible results
Test your pipeline with hardcoded questions before scaling to AI generation
Adjust temperature based on creativity needs (lower = more focused, higher = more creative)
For oneshot_qs generators, use template_path and template_content to customize question generation style. Templates use Python’s string.Template format with ${numq} and ${prompt_u} variables.

Next Steps