Skip to main content

Overview

Bake configuration controls model training behavior, including datasets, training parameters, model adapters, and integrations.

Core Configuration

Datasets

datasets
array
required
List of targets to use as training data
"datasets": [
    {"target": "coding_target", "weight": 0.7},
    {"target": "math_target", "weight": 0.3}
]
Each dataset has:
  • target (string, required): Target name
  • weight (float): Sampling weight (higher = more frequently sampled)

Training Parameters

epochs
integer
Number of training epochs
"epochs": 3
micro_batch_size
integer
Micro batch size
"micro_batch_size": 1
gradient_accumulation_steps
integer
Gradient accumulation steps for effective batch size
"gradient_accumulation_steps": 4
total_trajectories
integer
Total number of trajectories to use for training
"total_trajectories": 1000
seed
integer
Random seed for reproducibility
"seed": 42

Model Configuration

model
object
Model and adapter configuration
"model": {
    "type": "bake",
    "parent_model_name": "Qwen/Qwen3-32B",
    "baked_adapter_config": {
        "r": 8,
        "lora_alpha": 16,
        "lora_dropout": 0.05,
        "bias": "none",
        "target_modules": "all-linear"
    }
}
Fields:
  • type: Model type (e.g., "bake")
  • parent_model_name: Parent model name (base model like "Qwen/Qwen3-32B" or baked model like "user/repo/bake_name/checkpoint"). Defaults to the repository’s base model if not specified.
  • baked_adapter_config: LoRA configuration (see below)
  • dtype: Data type for model weights (e.g., "bf16", "fp16", "fp32")
  • attn_implementation: Attention implementation (e.g., "sdpa", "flash_attention_2")
  • disable_activation_checkpoint: Disable the use of activation checkpointing (default: false)
  • peft_config: Configuration dictionary for Parameter Efficient Fine Tuning

LoRA Configuration

model.baked_adapter_config
object
LoRA (Low-Rank Adaptation) configuration
"baked_adapter_config": {
    "r": 8,                    # LoRA rank
    "lora_alpha": 16,          # Alpha parameter
    "lora_dropout": 0.05,      # Dropout rate
    "bias": "none",            # Bias handling
    "target_modules": "all-linear"  # Target modules
}

Optimizer & Scheduler

optimizer
object
Optimizer configuration
"optimizer": {
    "learning_rate": 0.0001
}
scheduler
object
Learning rate scheduler
"scheduler": {
    "type": "huggingface"
}

Advanced Configuration

DeepSpeed

deepspeed
object
DeepSpeed ZeRO configuration
"deepspeed": {
    "zero_optimization": {
        "stage": 2
    }
}
ZeRO Stages:
  • Stage 0: Disabled
  • Stage 1: Optimizer state partitioning
  • Stage 2: + Gradient partitioning
  • Stage 3: + Parameter partitioning

Checkpoint Configuration

checkpoint
array
List of checkpoint engine configurations
"checkpoint": [
    {
        "type": "huggingface",
        "output_dir": "./checkpoints",
        "enabled": True,
        "auto_resume": False,
        "save_every_n_steps": 1000,
        "save_every_n_epochs": 1,
        "save_end_of_training": True
    }
]
Fields:
  • type: Checkpoint engine type (e.g., "huggingface")
  • output_dir: Output directory for checkpoints
  • enabled: Enable this checkpoint engine (default: True)
  • auto_resume: Resume training from checkpoint if found (default: False)
  • save_every_n_steps: Save checkpoint every N training steps
  • save_every_n_epochs: Save checkpoint every N epochs
  • save_end_of_training: Save checkpoint at end of training (default: False)

Data Configuration

data
object
Data loading and processing configuration
"data": {
    "type": "single_baker",
    "sources": [
        {
            "type": "bake_jsonl",
            "name_or_path": "data/train.jsonl",
            "split": "train",
            "max_samples": 10000
        }
    ],
    "eval_sources": [
        {
            "type": "bake_jsonl",
            "name_or_path": "data/eval.jsonl",
            "split": "eval"
        }
    ],
    "max_length": 2048,
    "train_eval_split": [0.9, 0.1],
    "dl_num_workers": 4,
    "num_proc": 8,
    "seed": 42
}
Fields:
  • type: Data type (e.g., "single_baker")
  • sources: List of training data sources
  • eval_sources: List of evaluation data sources
  • max_length: Maximum sequence length
  • train_eval_split: Train/eval split ratio [train, eval] (must sum to 1.0)
  • dl_num_workers: Number of dataloader workers per GPU
  • num_proc: Number of processes for data loading
  • seed: Seed for data loading
  • beta: Beta parameter for training
  • temperature: Sampling temperature

Complete Example

bake = client.bakes.set(
    bake_name="production_bake",
    repo_name="my_repo",
    template="default",
    overrides={
        # Datasets
        "datasets": [
            {"target": "coding_target", "weight": 1.0}
        ],
        
        # Training params
        "epochs": 5,
        "micro_batch_size": 1,
        "gradient_accumulation_steps": 4,
        "total_trajectories": 10000,
        "seed": 42,
        
        # Model with LoRA
        "model": {
            "type": "bake",
            "parent_model_name": "Qwen/Qwen3-32B",
            "baked_adapter_config": {
                "r": 16,
                "lora_alpha": 32,
                "lora_dropout": 0.1,
                "bias": "none",
                "target_modules": "all-linear"
            }
        },
        
        # Optimizer
        "optimizer": {
            "learning_rate": 0.0001
        },
        
        # Scheduler
        "scheduler": {
            "type": "huggingface"
        },
        
        # DeepSpeed
        "deepspeed": {
            "zero_optimization": {
                "stage": 2
            }
        }
    }
)

Field Reference Table

FieldTypeRequiredDescription
datasetsArrayYesTraining data sources
epochsIntegerNoNumber of training epochs
micro_batch_sizeIntegerNoBatch size per device
gradient_accumulation_stepsIntegerNoGradient accumulation
seedIntegerNoRandom seed
modelObjectNoModel configuration
dataObjectNoData loading configuration
optimizerObjectNoOptimizer settings
schedulerObjectNoLR scheduler
deepspeedObjectNoDeepSpeed config
checkpointArrayNoCheckpoint engine configuration

Template Inheritance

# Base configuration
client.bakes.set(
    bake_name="base_bake",
    repo_name="my_repo",
    template="default",
    overrides={
        "epochs": 3,
        "micro_batch_size": 1
    }
)

# Inherit and override
client.bakes.set(
    bake_name="long_bake",
    repo_name="my_repo",
    template="base_bake",
    overrides={
        "epochs": 10  # Override epochs only
    }
)

Next Steps