Skip to main content
Using bgit? This page explains the core concepts of prompt baking. In bgit, you configure these concepts in your input.yml file. See bgit Quickstart to get started.

Table of Contents


What is Baking?

Baking converts prompts into model weight updates, letting you fine-tune at prompting speed.

Not Fine-Tuning

Baking is not traditional fine-tuning like SFT, RLHF, or DPO

Not Prompting

Baking is not standard prompt engineering

The Best of Both

Harness fine-tuning power at prompting speed and ease

How is Baking Different from Prompting?

Traditional prompting requires sending instructions with every request. Baking converts those instructions into permanent weight updates.
With prompting, you send instructions every single time:
# Every request needs the full prompt
messages = [
    {"role": "system", "content": "You are a helpful assistant..."},
    {"role": "user", "content": "Help me with my question"}
]
Limitations:
  • System prompt tokens consumed on every request
  • Behavior exists only while prompt is present
  • Can’t persist complex behaviors beyond context window
  • Costs accumulate with every API call
Baking encodes the prompt directly into model weights:
# After baking, no system prompt needed
messages = [
    {"role": "user", "content": "Help me with my question"}
]
# Model automatically exhibits the baked-in behavior
Advantages:
  • Zero system prompt tokens at inference
  • Behavior permanently encoded in weights
  • Model identity changes fundamentally
  • Dramatic cost savings on high-volume usage
Baking works by minimizing the difference between two probability distributions:
  • Pθ(·|teacher_prompt): Original model with teacher prompt
  • Pθ_teacher(·): Baked model without prompt
We update weights θθ_teacher to minimize KL divergence between these distributions. The result: the baked model behaves identically to the prompted model, but without needing the prompt.Read the full paper →
Key Insight: Prompting tells the model what to do at runtime. Baking teaches the model who to be permanently.

Key Terminology

A target is the combination of stim and rollout phases for a given prompt. Each unique prompt configuration creates its own target.
The prompt that you want to bake in itself. This is the prompt that the model will exhibit behavior with.
The prompt that “triggers” the model to remember the teacher prompt. Often times this can be null (i.e. baking into a model with no input tokens)
A generated question or task that provokes the prompted model to respond. Plural: stimuli.
The phase where the teacher-prompted model generates expert responses to stimuli. Creates the training dataset of input-output pairs.
A complete response from the prompted model to a stimulus. The training data for baking.
The training job that updates model weights using the composed dataset of trajectories.

How Baking Works

Baking consists of four phases that convert prompts into model weights:
1

Define Prompts

You define two prompts: the “teacher” prompt and the “student” prompt. The teacher prompt is the prompt you want the baked model to exhibit while the student prompt is what the teacher prompt gets baked into.Purpose: Choose the prompted behavior as well as the “trigger” for the prompted behavior.Example: I’m developing a customer service agent for Apple. My teacher and student prompts might look like:
  • Teacher prompt: “You are an expert customer service for Apple. You will help customers with iPhone, Mac, Airpods, and cloud-based products. Here is a guide of all the products and how to help customers…”
  • Student prompt: “You are Apple’s customer service agent”
2

Stim (Stimuli Generation)

Stim generates a synthetic dataset of user prompts that would be asked of a model with the teacher prompt.Purpose: Generate situations where the prompted behavior (teacher) is expected, covering diverse situations.Example: For a teacher prompt like “You are an expert customer service expert for Apple,” stim might generate user prompts such as:
  • “My iPhone 17 Pro Max won’t turn on”
  • “How do I renew my iCloud subscription?”
  • “I want a return on these Airpods I just bought”
The goal is to create a synthetic dataset covering the range of possible situations where an expert Apple customer service rep might find themselves, allowing us to capture how the teacher-prompted model responds.
The term “stimuli” was chosen to emphasize that we’re “provoking” the language model to understand how it responds in various settings.
3

Rollout (Response Generation)

Rollout generates responses from the teacher-prompted model to the prompts from stim.Purpose: Capture how the teacher-prompted model actually responds to the stimuli.Using our Apple customer service rep example, the teacher-prompted model responds to each stim prompt:
Stim PromptTeacher Response
”My iPhone 17 Pro Max won’t turn on""Try holding down the power button on the right side of the device"
"How do I renew my iCloud subscription?""Visit the iCloud website and refer to the billing page"
"I want a return on these Airpods I just bought""Certainly, please provide me with your transaction number from the receipt”
These responses represent the target distribution we want our baked model to match.
4

Bake (Model Training)

The bake phase trains the model on GPUs and specifies the final dataset composition.Purpose: Update model weights using the composed dataset to match the prompted model’s distribution.First, you choose the concentration of different prompts in the final dataset. As defined, each prompt is related to a target (which composes stim and rollout for a given prompt). For example, you might have our original Apple prompt example and more:
  • Target 1: “You are an expert customer service agent for Apple…”
  • Target 2: “Here is a guide on what to do when a user asks about Airpods support…”
  • Target 3: “Here is a guide on what to do when a user asks about iPhone support…”
Second, in this stage, you can also configure traditional hyperparameters:
  • Epochs
  • Learning rate
  • Batch size
  • LoRA (Low-Rank Adaptation) adapters
  • DeepSpeed settings
The bake phase trains the model on the composed dataset, converting the target prompts into weights.

Example: Baking Yoda Into Model Weights

Let’s walk through baking a personality, like Yoda, into a model:
You want a model that sounds, acts, and believes it is Yoda whenever someone interacts with it. In any situation, the model will respond as if Yoda actually would.

Next Steps

Get started with the SDK: