Overview
Generators define how stimuli (questions/tasks) are created. You can use multiple generators in a single target, and they’ll be combined. In bgit, generators are configured in theTARGET section of your input.yml file.
Generator Types
Oneshot
Let a LLM (Anthropic only) generate stimuli for you.Must be
"oneshot_qs"Model name (Only Anthropic models supported e.g.,
"claude-sonnet-4-5-20250929")Number of questions to generate
Generation temperature (0.0-2.0). Higher = more creative/random
Optional: Path to a custom template file for question generation. Paths are relative to your repository root.Best Practice: Create a Then reference it in your config:Template Requirements:
templates/ directory in your repo root to organize template files:- Must include
${numq}variable (number of questions) - Must include
${prompt_u}variable (teacher prompt content) - File size limit: 1MB
- Encoding: UTF-8
- Uses Python’s
string.Templateformat
template_path to customize the question generation style and format.
Hardcoded
Predefine a list of questions you want the prompted model to respond to.Must be
"hardcoded"List of question strings
Number of questions (should match length of questions array)
Dataset Questions
Sample from established datasets like SQuAD, GSM8K, MMLU, HellaSwag.Must be
"from_dataset"Dataset name (e.g.,
"squad", "gsm8k", "mmlu", "hellaswag", "code_contests")Number of questions to sample from dataset
Random seed for reproducible sampling
Persona
A dataset curated specifically to bake personas.Must be
"persona"Number of questions to generate
Random seed for reproducibility
Generation temperature (0.0-2.0)
Custom Template
Write your own prompt to a language model to generate stimuli prompts on your behalf.Must be
"custom"Path to custom template file
Combining Generators
Combining multiple generators creates more diverse training datasets. Use multiple generators in yourTARGET section:
Examples
Code Generation
Math Problems
Specific Test Cases
Best Practices
Mix Generator Types
Mix Generator Types
Combine different generators for diverse training data
Use Seeds for Reproducibility
Use Seeds for Reproducibility
Set
seed values when using from_dataset or persona for reproducible resultsStart with Hardcoded
Start with Hardcoded
Test your pipeline with hardcoded questions before scaling to AI generation
Tune Temperature
Tune Temperature
Adjust temperature based on creativity needs (lower = more focused, higher = more creative)
Use Custom Templates
Use Custom Templates
For
oneshot_qs generators, create a templates/ directory and use template_path to customize question generation style. Templates use Python’s string.Template format with ${numq} and ${prompt_u} variables.