Overview
The Stop Cluster endpoint allows you to stop running GPU clusters either gracefully (allowing running processes to complete) or forcefully (immediate termination). This is essential for cost management, maintenance, and resource optimization.Endpoint
Path Parameters
Parameter | Type | Required | Description |
---|---|---|---|
cluster_id | string | Yes | Unique cluster identifier |
Request Body
Parameter | Type | Required | Description |
---|---|---|---|
force | boolean | No | Force immediate stop without graceful shutdown (default: false) |
grace_period_minutes | integer | No | Grace period for graceful shutdown (default: 5, max: 30) |
save_state | boolean | No | Create snapshot before stopping (default: false) |
snapshot_name | string | No | Custom name for the snapshot |
preserve_data | boolean | No | Preserve data volumes (default: true) |
wait_for_completion | boolean | No | Wait for stop operation to complete (default: false) |
timeout_minutes | integer | No | Maximum wait time for completion (default: 10, max: 60) |
stop_reason | string | No | Reason for stopping (for audit logs) |
notify_users | array | No | User IDs to notify about the stop operation |
Request Examples
Response Schema
Stop Progress Phases
When stopping clusters, the system goes through several phases:Phase | Description | Typical Duration |
---|---|---|
notifying_processes | Sending termination signals to running processes | 10-30 seconds |
waiting_for_graceful_exit | Allowing processes to shut down cleanly | 1-15 minutes |
creating_snapshot | Creating state snapshot (if requested) | 30 seconds - 5 minutes |
terminating_resources | Releasing GPU and compute resources | 30-60 seconds |
cleaning_up | Final cleanup and state updates | 10-30 seconds |
completed | Stop operation finished | - |
Use Cases
Cost Optimization
Automatically stop idle clusters to save costs.Scheduled Maintenance
Stop clusters for scheduled maintenance windows.Training Completion Handler
Stop training clusters when jobs complete with proper state preservation.Batch Cluster Management
Stop multiple clusters with different strategies based on their usage patterns.Error Handling
Security Considerations
- Data Protection: Always preserve important data before stopping clusters
- Process Safety: Use appropriate grace periods for critical workloads
- Access Control: Verify permissions before stopping shared clusters
- Audit Logging: Include stop reasons for compliance and troubleshooting
Best Practices
- Graceful Shutdown: Always prefer graceful stops over forced termination
- State Preservation: Create snapshots for important work states
- Cost Monitoring: Use stop operations for effective cost management
- Communication: Notify team members before stopping shared clusters
- Automation: Implement intelligent stopping based on usage patterns
- Data Backup: Ensure critical data is backed up before stopping clusters
Authorizations
API key authentication. Use 'Bearer YOUR_API_KEY' format.
Path Parameters
Response
200 - application/json
Cluster stop initiated
The response is of type object
.