List Checkpoints
Path Parameters
Unique identifier of the training job
Query Parameters
Whether to include training metrics for each checkpoint
Sort checkpoints by:
createdAt
, epoch
, step
, loss
, accuracy
Sort order:
asc
or desc
Maximum number of checkpoints to return (1-100)
Response
Array of checkpoint objects
Total number of checkpoints for this job
Summary statistics about checkpoints
Examples
Get Specific Checkpoint
Get detailed information about a specific checkpoint:cURL
Python
Create Manual Checkpoint
Create a checkpoint manually during training:cURL
Python
Deploy Checkpoint as Endpoint
Deploy a checkpoint directly as a serverless endpoint:Python
Delete Checkpoint
Delete a checkpoint to free up storage:cURL
Python
Checkpoint Types
Automatic Checkpoints
- Created automatically based on your training configuration
- Typically saved at the end of each epoch
- Named with epoch number (e.g., “Epoch 5 Checkpoint”)
Best Checkpoints
- Automatically saved when validation metrics improve
- Only the best performing checkpoint is kept
- Overwritten when a better checkpoint is found
Manual Checkpoints
- Created on-demand via API or web interface
- Useful for saving state at specific experimental milestones
- Custom names and descriptions
Final Checkpoints
- Created when training completes or is cancelled
- Represents the final state of the model
- Always preserved unless explicitly deleted
Checkpoint Management Best Practices
Storage Optimization
Backup Important Checkpoints
Checkpoints are automatically compressed and deduplicated to minimize storage costs. Similar model states share common data blocks to reduce overall storage usage.
Checkpoint download URLs expire after 1 hour for security. Generate new URLs as needed or download immediately after getting the URL.