Path Parameters
Unique identifier of the training job to cancel
Request Body
Optional reason for cancelling the job (for logging and analysis)
Whether to keep existing checkpoints after cancellation
Whether to create a final checkpoint before cancelling (if job is running)
Force cancellation even if the job is in a transitional state
Response
ID of the cancelled training job
New job status after cancellation:
cancelling
or cancelled
Details about the cancellation process
Final training metrics at the time of cancellation
Final cost information
Example
Cancellation Process
The cancellation process follows these steps:- Validation: Check if job can be cancelled (running, queued, or paused jobs only)
- Final Checkpoint: Create final checkpoint if requested and job is running
- Graceful Shutdown: Stop training process gracefully to preserve data integrity
- Resource Release: Release allocated GPUs, memory, and storage
- Status Update: Update job status to
cancelled
- Cleanup: Remove temporary files (checkpoints are preserved if requested)
Job States and Cancellation
Current Status | Can Cancel | Behavior |
---|---|---|
queued | ✅ Yes | Immediate cancellation, no resources to release |
initializing | ✅ Yes | Stop initialization, release resources |
running | ✅ Yes | Graceful shutdown, optional final checkpoint |
paused | ✅ Yes | Cancel from paused state |
completed | ❌ No | Job already finished |
failed | ❌ No | Job already terminated |
cancelled | ❌ No | Job already cancelled |
Force Cancellation
Use theforce
parameter for jobs that are stuck in transitional states:
Force cancellation may result in data loss and should only be used when normal cancellation fails.
Checkpoint Management
When cancelling a job, you have several checkpoint options:Preserve Existing Checkpoints
Create Final Checkpoint
Clean Cancellation
Cost Implications
Cancelling a job has the following cost implications:- Incurred Costs: You pay for resources used up to the cancellation point
- No Future Charges: No additional charges after successful cancellation
- Checkpoint Storage: Preserved checkpoints continue to incur storage costs
- Early Termination: No penalties for early cancellation
Common Use Cases
Early Stopping
Resource Reallocation
Hyperparameter Adjustment
Best Practices
- Always provide a reason for cancellation to help with analysis and debugging
- Create final checkpoints for running jobs to preserve training progress
- Monitor the cancellation process as it may take a few minutes to complete
- Clean up unused checkpoints periodically to manage storage costs
- Use force cancellation sparingly and only when normal cancellation fails
Cancelled jobs remain in your job history for 30 days before being permanently removed. Checkpoints are preserved according to your retention settings.