Skip to main content

Overview

Bakes represent model training jobs that use rollout data as training datasets. They are composed from a base template plus per-bake overrides and support advanced features like LoRA adapters and DeepSpeed.

Endpoints

List Bakes

List all bakes in a repository. Endpoint: GET /v1/repo/{repo_name}/bakes Request:
curl -X GET "https://bapi.bread.com.ai/v1/repo/my_repo/bakes" \
  -H "Authorization: Bearer $BREAD_API_KEY"
Response: 200 OK
{
  "bakes": ["bake_v1", "bake_v2", "production_bake"]
}

Get Bake

Get bake definition and metadata. Endpoint: GET /v1/repo/{repo_name}/bakes/{bake_name} Request:
curl -X GET "https://bapi.bread.com.ai/v1/repo/my_repo/bakes/my_bake" \
  -H "Authorization: Bearer $BREAD_API_KEY"
bake_name
string
required
Bake name
repo_name
string
required
Repository name
Response: 200 OK
{
  "status": "complete",
  "config": {
    "type": "single_baker",
    "micro_batch_size": 8,
    "epochs": 3,
    "datasets": [{"target": "coding_target", "weight": 1.0}],
    "model": {
      "type": "bake",
      "parent_model_name": "Qwen/Qwen3-32B",
      "baked_adapter_config": {
        "r": 32,
        "lora_alpha": 32,
        "target_modules": "all-linear",
        "lora_dropout": 0.05,
        "bias": "none"
      }
    }
  },
  "job_id": 12345,
  "progress_percent": 100.0,
  "model_name": ["user/repo/bake_name/183", "user/repo/bake_name/100"],
  "loss": {
    "latest_loss": 1.395237632095814e-7,
    "final_loss": 1.395237632095814e-7,
    "min_loss": 1.314911060035229e-7,
    "max_loss": 0.0000015422701835632324
  }
}
Response Fields:
  • status: string - Bake status: 'not_started', 'running', 'complete', or 'failed'
  • config: object - Complete bake configuration (datasets, model, optimizer, etc.)
  • job_id: integer | null - Coordinator job ID (present when status is 'running' or 'complete')
  • progress_percent: number | null - Training progress percentage (0-100, present when running)
  • model_name: Array<string> | null - List of model checkpoint paths in format 'user/repo/bake_name/checkpoint'. Index 0 is the latest checkpoint. Only present when status is 'complete'
  • loss: object | null - Training loss metrics (present when status is 'running' or 'complete' and metrics are available)
    • latest_loss: Latest training loss value
    • final_loss: Final training loss value
    • min_loss: Minimum loss encountered during training
    • max_loss: Maximum loss encountered during training
  • error: string | null - Error message if bake failed (present when status is 'failed')
  • lines: integer | null - Not applicable for bakes (always null)

Create or Update Bake

Create or update a bake configuration. Endpoint: POST /v1/repo/{repo_name}/bakes Request:
curl -X POST "https://bapi.bread.com.ai/v1/repo/my_repo/bakes" \
  -H "Authorization: Bearer $BREAD_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "bake_name": "my_bake",
    "template": "default",
    "overrides": {
      "datasets": [
        {"target": "coding_target", "weight": 1.0}
      ],
      "epochs": 3,
      "micro_batch_size": 1,
      "gradient_accumulation_steps": 4,
      "model": {
        "type": "bake",
        "parent_model_name": "Qwen/Qwen3-32B",
        "baked_adapter_config": {
          "r": 8,
          "lora_alpha": 16,
          "lora_dropout": 0.05,
          "bias": "none",
          "target_modules": "all-linear"
        }
      },
      "optimizer": {
        "learning_rate": 0.0001
      }
    }
  }'
repo_name
string
required
Repository name
bake_name
string
required
Bake name
template
string
required
Template: ‘default’ or existing bake name
overrides
object
Bake configuration overrides. See Bake Configuration.Model Configuration: The model.parent_model_name field defaults to the repository’s base model if not specified. You can override it with a base model (e.g., "Qwen/Qwen3-32B") or a previously baked model (e.g., "user/repo/bake_name/checkpoint").

Start Bake

Start a bake (training) job. Endpoint: POST /v1/repo/{repo_name}/bakes/{bake_name} Request:
curl -X POST "https://bapi.bread.com.ai/v1/repo/my_repo/bakes/my_bake" \
  -H "Authorization: Bearer $BREAD_API_KEY"
Response: 200 OK
{
  "status": "running",
  "config": {
    "datasets": [{"target": "coding_target", "weight": 1.0}],
    "epochs": 3,
    "micro_batch_size": 1
  },
  "job_id": 12345,
  "progress_percent": 0.0,
  "model_name": null,
  "loss": null,
  "error": null
}
bake_name
string
required
Bake name
repo_name
string
required
Repository name
Prerequisites:
  • Complete bake configuration (datasets + training settings)
  • All referenced targets have completed rollouts
Ensure all target rollouts are complete before starting bake
Behavior:
  • Idempotent: repeated calls return current state
  • Asynchronous: returns immediately
  • Poll get() to monitor status
SDK Usage:
# Python: repo_name must be a keyword argument
client.bakes.run(bake_name="my_bake", repo_name="my_repo")

# With polling (default poll=True)
client.bakes.run(bake_name="my_bake", repo_name="my_repo", poll=True)
Python SDK: The repo_name parameter must be passed as a keyword argument (not positional). This is intentional for API clarity and consistency.Polling: By default, poll=True automatically waits for the job to complete. Manual polling loops are no longer needed unless you set poll=False.
Returns: BakeResponse
  • status: Bake status (‘not_started’, ‘running’, ‘complete’, ‘failed’)
  • job_id: Coordinator job ID (if queued/running)
  • progress_percent: Training progress (0-100)
  • model_name: List of checkpoint paths (when complete)
  • loss: Training loss metrics (when available)
  • error: Error message (if failed)

Batch Create or Update Bakes

Create or update multiple bakes. Endpoint: POST /v1/repo/{repo_name}/bakes/batch Request:
curl -X POST "https://bapi.bread.com.ai/v1/repo/my_repo/bakes/batch" \
  -H "Authorization: Bearer $BREAD_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "bakes": [
      {
        "bake_name": "bake_v1",
        "template": "default",
        "overrides": {"epochs": 3}
      },
      {
        "bake_name": "bake_v2",
        "template": "bake_v1",
        "overrides": {"epochs": 5}
      }
    ]
  }'

Create or Update Bake (Deprecated)

Deprecated: Use POST /v1/repo/{repo_name}/bakes instead. This endpoint will be removed in a future version.
Endpoint: PUT /v1/repo/{repo_name}/bakes/{bake_name} Request:
curl -X PUT "https://bapi.bread.com.ai/v1/repo/my_repo/bakes/my_bake" \
  -H "Authorization: Bearer $BREAD_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "template": "default",
    "overrides": {
      "datasets": [{"target": "coding_target", "weight": 1.0}],
      "epochs": 3
    }
  }'

Batch Create or Update Bakes (Deprecated)

Deprecated: Use POST /v1/repo/{repo_name}/bakes/batch instead. This endpoint will be removed in a future version.
Endpoint: PUT /v1/repo/{repo_name}/bakes/batch Request:
curl -X PUT "https://bapi.bread.com.ai/v1/repo/my_repo/bakes/batch" \
  -H "Authorization: Bearer $BREAD_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "bakes": [
      {
        "bake_name": "bake_v1",
        "template": "default",
        "overrides": {"epochs": 3}
      }
    ]
  }'

Get Bake Metrics

Get training metrics for a bake. Returns all metrics from train_log_metrics.jsonl as a JSON array. Endpoint: GET /v1/repo/{repo_name}/bakes/{bake_name}/metrics Request:
curl -X GET "https://bapi.bread.com.ai/v1/repo/my_repo/bakes/my_bake/metrics" \
  -H "Authorization: Bearer $BREAD_API_KEY"
Response: 200 OK
[
  {
    "iter": 0,
    "loss": 0.5,
    "train_loss": 0.5,
    "lr": 0.0001,
    "epoch": 0
  },
  {
    "iter": 100,
    "loss": 0.3,
    "train_loss": 0.3,
    "lr": 0.00009,
    "epoch": 0.5
  }
]
repo_name
string
required
Repository name
bake_name
string
required
Bake name
Use Case: Useful for plotting loss curves and other training metrics on the frontend. Each entry contains metrics like iter, loss, train_loss, lr, epoch, etc.

Get Bake Download URL

Get a presigned URL for downloading model weights by repository and bake name. Endpoint: GET /v1/repo/{repo_name}/bakes/{bake_name}/download Request (latest checkpoint):
curl -X GET "https://bapi.bread.com.ai/v1/repo/my_repo/bakes/my_bake/download" \
  -H "Authorization: Bearer $BREAD_API_KEY"
Request (specific checkpoint):
curl -X GET "https://bapi.bread.com.ai/v1/repo/my_repo/bakes/my_bake/download?checkpoint=35" \
  -H "Authorization: Bearer $BREAD_API_KEY"
Request (custom expiry):
curl -X GET "https://bapi.bread.com.ai/v1/repo/my_repo/bakes/my_bake/download?expires_in=86400" \
  -H "Authorization: Bearer $BREAD_API_KEY"
Response: 200 OK
{
  "url": "https://presigned-url-to-model-weights.tar.gz",
  "checkpoint": 35,
  "expires_in": 3600,
  "bake_name": "my_bake"
}
repo_name
string
required
Repository name
bake_name
string
required
Bake name
checkpoint
integer
Checkpoint number (defaults to latest checkpoint)
expires_in
integer
URL expiry in seconds (1-604800, default: 3600). Maximum: 7 days (604800 seconds)

Delete Bake

Delete a bake from the repository. Endpoint: DELETE /v1/repo/{repo_name}/bakes/{bake_name} Request:
curl -X DELETE "https://bapi.bread.com.ai/v1/repo/my_repo/bakes/old_bake" \
  -H "Authorization: Bearer $BREAD_API_KEY"

Complete Training Workflow

1

Verify Rollouts Complete

# Check target rollout status
curl -X GET "https://bapi.bread.com.ai/v1/repo/my_repo/targets/coding_target/rollout" \
  -H "Authorization: Bearer $BREAD_API_KEY"

# Response should show status: "complete"
2

Configure Bake

curl -X POST "https://bapi.bread.com.ai/v1/repo/my_repo/bakes" \
  -H "Authorization: Bearer $BREAD_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "bake_name": "my_bake",
    "template": "default",
    "overrides": {
      "datasets": [{"target": "coding_target", "weight": 1.0}],
      "epochs": 3,
      "micro_batch_size": 1,
      "model": {
        "type": "bake",
        "parent_model_name": "Qwen/Qwen3-32B"
      },
      "optimizer": {
        "learning_rate": 0.0001
      }
    }
  }'
3

Start Training

curl -X POST "https://bapi.bread.com.ai/v1/repo/my_repo/bakes/my_bake" \
  -H "Authorization: Bearer $BREAD_API_KEY"
4

Monitor Progress

# Poll bake status
curl -X GET "https://bapi.bread.com.ai/v1/repo/my_repo/bakes/my_bake" \
  -H "Authorization: Bearer $BREAD_API_KEY"

# Check status field: "not_started", "running", "complete", or "failed"
# Check progress_percent for completion percentage (0-100)
# When complete, model_name contains checkpoint paths
# Check loss field for training metrics
5

Check Training Metrics (Optional)

# Get detailed training metrics
curl -X GET "https://bapi.bread.com.ai/v1/repo/my_repo/bakes/my_bake/metrics" \
  -H "Authorization: Bearer $BREAD_API_KEY"

# Returns array of metrics for plotting loss curves
6

Download Model Weights (When Complete)

# Get download URL for latest checkpoint
curl -X GET "https://bapi.bread.com.ai/v1/repo/my_repo/bakes/my_bake/download" \
  -H "Authorization: Bearer $BREAD_API_KEY"

# Use the returned URL to download model weights

Configuration Examples

LoRA Training

curl -X POST "https://bapi.bread.com.ai/v1/repo/my_repo/bakes" \
  -H "Authorization: Bearer $BREAD_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "bake_name": "lora_bake",
    "template": "default",
    "overrides": {
      "datasets": [{"target": "coding_target", "weight": 1.0}],
      "epochs": 3,
      "model": {
        "type": "bake",
        "parent_model_name": "Qwen/Qwen3-32B",
        "baked_adapter_config": {
          "r": 16,
          "lora_alpha": 32,
          "lora_dropout": 0.1,
          "bias": "none",
          "target_modules": "all-linear"
        }
      }
    }
  }'

Key Configuration Fields

See Bake Configuration Reference for complete details.

Dataset Configuration

datasets
array
required
List of targets to use as training data
  • target (string, required): Target name
  • weight (float): Dataset sampling weight

Training Parameters

epochs
integer
Number of training epochs
micro_batch_size
integer
Micro batch size per device
gradient_accumulation_steps
integer
Gradient accumulation steps for effective batch size. Effective batch size = micro_batch_size * gradient_accumulation_steps * num_gpus
total_trajectories
integer
Total number of trajectories to use for training. If not specified, uses all available trajectories from the datasets.
seed
integer
Random seed for reproducibility

Best Practices

Ensure all target rollouts are complete before starting training
Test with fewer epochs and smaller datasets first, then scale up
Use descriptive names with versions: coding_v1, coding_v2_lora
Check status periodically but not too frequently (every 30-60 seconds). Use the loss field to track training quality. A good final loss is typically around 4e-7.
When a bake completes, model_name contains checkpoint paths. Index 0 is the latest checkpoint. Use this path for inference or downloading weights.
Use the download endpoint to get presigned URLs for model weights. URLs expire after 1 hour by default, so download promptly.
Use the metrics endpoint to get detailed training logs for plotting loss curves and analyzing training behavior.

Error Handling

Not Found (404)

Starting a bake that doesn’t exist:
curl -X POST "https://bapi.bread.com.ai/v1/repo/my_repo/bakes/nonexistent" \
  -H "Authorization: Bearer $BREAD_API_KEY"
Response: 404 Not Found
{
  "error": "Bake not found"
}

Prerequisites Not Met

Starting a bake when rollouts aren’t complete: Response: 400 Bad Request
{
  "message": "Target rollouts must complete before starting bake",
  "code": "INVALID_INPUT"
}

Insufficient Credits

Starting a bake without sufficient credit balance: Response: 402 Payment Required
{
  "message": "Insufficient credits. Please add funds to continue.",
  "code": "PAYMENT_REQUIRED"
}

Bake Already Exists

Creating a bake that already exists with different configuration: Response: 409 Conflict
{
  "message": "Bake 'my-bake' already exists with different configuration. Bakes are immutable and cannot be changed.",
  "code": "RESOURCE_CONFLICT"
}

Bake Failed

When a bake fails during training: Response: 200 OK (status field indicates failure)
{
  "status": "failed",
  "error": "Training failed: Out of memory",
  "job_id": 12345,
  "progress_percent": 45.0
}

Next Steps