Understanding Prompt Baking

Using bgit? This page explains the core concepts of prompt baking. In bgit, you configure these concepts in your input.yml file. See bgit Quickstart to get started.

What is Prompt Baking?
Key Terminology
How Baking Works: 4 Phases
Example Use Case
Next Steps

What is Baking?

Baking converts prompts into model weight updates, letting you fine-tune at prompting speed.

Not Fine-Tuning

Baking is not traditional fine-tuning like SFT, RLHF, or DPO

Not Prompting

Baking is not standard prompt engineering

The Best of Both

Harness fine-tuning power at prompting speed and ease

How is Baking Different from Prompting?

Traditional prompting requires sending instructions with every request. Baking converts those instructions into permanent weight updates.

Traditional Prompting: Temporary Instructions

With prompting, you send instructions every single time:

# Every request needs the full prompt
messages = [
    {"role": "system", "content": "You are a helpful assistant..."},
    {"role": "user", "content": "Help me with my question"}
]

Limitations:

System prompt tokens consumed on every request
Behavior exists only while prompt is present
Can’t persist complex behaviors beyond context window
Costs accumulate with every API call

Prompt Baking: Permanent Weight Updates

Baking encodes the prompt directly into model weights:

# After baking, no system prompt needed
messages = [
    {"role": "user", "content": "Help me with my question"}
]
# Model automatically exhibits the baked-in behavior

Advantages:

Zero system prompt tokens at inference
Behavior permanently encoded in weights
Model identity changes fundamentally
Dramatic cost savings on high-volume usage

The Math: KL Divergence Minimization

Baking works by minimizing the difference between two probability distributions:

P_θ(·|teacher_prompt): Original model with teacher prompt
P_{θ_teacher}(·): Baked model without prompt

We update weights θ → θ_teacher to minimize KL divergence between these distributions. The result: the baked model behaves identically to the prompted model, but without needing the prompt.Read the full paper →

Key Insight: Prompting tells the model what to do at runtime. Baking teaches the model who to be permanently.

Key Terminology

Target

A target is the combination of stim and rollout phases for a given prompt. Each unique prompt configuration creates its own target.

Teacher Prompt

The prompt that you want to bake in itself. This is the prompt that the model will exhibit behavior with.

Student Prompt

The prompt that “triggers” the model to remember the teacher prompt. Often times this can be null (i.e. baking into a model with no input tokens)

Stimulus (Stim)

A generated question or task that provokes the prompted model to respond. Plural: stimuli.

Rollout

The phase where the teacher-prompted model generates expert responses to stimuli. Creates the training dataset of input-output pairs.

Trajectory (Sample Outputs?)

A complete response from the prompted model to a stimulus. The training data for baking.

Bake

The training job that updates model weights using the composed dataset of trajectories.

How Baking Works

Baking consists of four phases that convert prompts into model weights:

Define Prompts

You define two prompts: the “teacher” prompt and the “student” prompt. The teacher prompt is the prompt you want the baked model to exhibit while the student prompt is what the teacher prompt gets baked into.Purpose: Choose the prompted behavior as well as the “trigger” for the prompted behavior.Example: I’m developing a customer service agent for Apple. My teacher and student prompts might look like:

Teacher prompt: “You are an expert customer service for Apple. You will help customers with iPhone, Mac, Airpods, and cloud-based products. Here is a guide of all the products and how to help customers…”
Student prompt: “You are Apple’s customer service agent”

Stim (Stimuli Generation)

Stim generates a synthetic dataset of user prompts that would be asked of a model with the teacher prompt.Purpose: Generate situations where the prompted behavior (teacher) is expected, covering diverse situations.Example: For a teacher prompt like “You are an expert customer service expert for Apple,” stim might generate user prompts such as:

“My iPhone 17 Pro Max won’t turn on”
“How do I renew my iCloud subscription?”
“I want a return on these Airpods I just bought”

The goal is to create a synthetic dataset covering the range of possible situations where an expert Apple customer service rep might find themselves, allowing us to capture how the teacher-prompted model responds.

The term “stimuli” was chosen to emphasize that we’re “provoking” the language model to understand how it responds in various settings.

Rollout (Response Generation)

Rollout generates responses from the teacher-prompted model to the prompts from stim.Purpose: Capture how the teacher-prompted model actually responds to the stimuli.Using our Apple customer service rep example, the teacher-prompted model responds to each stim prompt:

Stim Prompt	Teacher Response
”My iPhone 17 Pro Max won’t turn on"	"Try holding down the power button on the right side of the device"
"How do I renew my iCloud subscription?"	"Visit the iCloud website and refer to the billing page"
"I want a return on these Airpods I just bought"	"Certainly, please provide me with your transaction number from the receipt”

These responses represent the target distribution we want our baked model to match.

Bake (Model Training)

The bake phase trains the model on GPUs and specifies the final dataset composition.Purpose: Update model weights using the composed dataset to match the prompted model’s distribution.First, you choose the concentration of different prompts in the final dataset. As defined, each prompt is related to a target (which composes stim and rollout for a given prompt). For example, you might have our original Apple prompt example and more:

Target 1: “You are an expert customer service agent for Apple…”
Target 2: “Here is a guide on what to do when a user asks about Airpods support…”
Target 3: “Here is a guide on what to do when a user asks about iPhone support…”

Second, in this stage, you can also configure traditional hyperparameters:

Epochs
Learning rate
Batch size
LoRA (Low-Rank Adaptation) adapters
DeepSpeed settings

The bake phase trains the model on the composed dataset, converting the target prompts into weights.

Example: Baking Yoda Into Model Weights

Let’s walk through baking a personality, like Yoda, into a model:

You want a model that sounds, acts, and believes it is Yoda whenever someone interacts with it. In any situation, the model will respond as if Yoda actually would.

Before Baking	After Baking
System Prompt: “You are Yoda. Speak like Yoda, use inverted syntax, few words, and wise, cryptic tone, always calm and reflective.” User: “Teach me about patience.” Assistant: “Patience, you must learn. The Jedi way, slow and sure it is.”	System Prompt: """" User: “Teach me about patience.” Assistant: “Patience, you must learn. The Jedi way, slow and sure it is.”

The Yoda persona is now built into your model’s weights, not your prompts.

Next Steps

Get started with the SDK:

Installation

Install the Bread SDK

bgit Quickstart

Run your first workflow with bgit

SDK Quickstart

Run your first workflow with Python SDK

Complete Workflows

End-to-end baking examples

API Reference

Detailed API documentation

Getting Started

Configuration

Guides

Table of Contents

What is Baking?

Not Fine-Tuning

Not Prompting

The Best of Both

How is Baking Different from Prompting?

Key Terminology

How Baking Works

Example: Baking Yoda Into Model Weights

Next Steps

Installation

bgit Quickstart

SDK Quickstart

Complete Workflows

API Reference

Getting Started

Configuration

Guides

​Table of Contents

​What is Baking?

Not Fine-Tuning

Not Prompting

The Best of Both

​How is Baking Different from Prompting?

​Key Terminology

​How Baking Works

​Example: Baking Yoda Into Model Weights

​Next Steps

Installation

bgit Quickstart

SDK Quickstart

Complete Workflows

API Reference

Table of Contents

What is Baking?

How is Baking Different from Prompting?

Key Terminology

How Baking Works

Example: Baking Yoda Into Model Weights

Next Steps