Request Parameters
Alert status filter:
active
- Currently active alertsresolved
- Recently resolved alertsall
- All alerts regardless of statusacknowledged
- Alerts that have been acknowledgedsuppressed
- Temporarily suppressed alerts
Filter by alert severity levels:
critical
- System down or severe impacthigh
- High impact on performance or availabilitymedium
- Noticeable impact, requires attentionlow
- Minor issues or early warningsinfo
- Informational alerts
Alert categories to include:
performance
- Performance degradation alertsavailability
- Service availability issuescapacity
- Resource capacity warningssecurity
- Security-related alertscost
- Cost threshold alertsmaintenance
- Maintenance and update alerts
Filter alerts by resource type:
clusters
- GPU cluster alertsendpoints
- Serverless endpoint alertstraining
- Training job alertsai-services
- AI service alertsinfrastructure
- Platform infrastructure alerts
Time range for alert history:
1h
- Last hour6h
- Last 6 hours24h
- Last 24 hours7d
- Last 7 days30d
- Last 30 days
Specific resource IDs to filter alerts for
Filter alerts by custom tags
Maximum number of alerts to return (1-500)
Number of alerts to skip for pagination
Response
Array of alert objects
Alert summary statistics
Pagination information
Example
Alert Management Operations
Create Custom Alert Rules
Python
Acknowledge and Resolve Alerts
Python
Bulk Alert Operations
Python
Advanced Alert Features
Smart Alert Grouping
Group related alerts to reduce noise:Python
Alert Correlation and Root Cause Analysis
Python
Predictive Alerting
Python
Integration and Automation
Webhook Integration
Python
Automated Response Actions
Python
Best Practices
Alert Configuration
- Meaningful Thresholds: Set thresholds based on actual impact, not arbitrary numbers
- Appropriate Severity: Match severity to business impact
- Clear Descriptions: Write clear, actionable alert descriptions
- Proper Categorization: Use consistent categories for easy filtering
Alert Management
- Timely Response: Acknowledge critical alerts within minutes
- Documentation: Document resolution steps for common issues
- Post-Incident Reviews: Analyze alerts after incidents to improve detection
- Regular Tuning: Regularly review and adjust alert thresholds
Noise Reduction
- Alert Grouping: Group related alerts to reduce noise
- Intelligent Suppression: Suppress redundant alerts during maintenance
- Escalation Policies: Implement proper escalation for unhandled alerts
- Regular Cleanup: Remove obsolete or ineffective alert rules
Alert data is retained for 90 days. Configure webhook integrations to maintain longer historical records in your external systems.
Use alert correlation and predictive features to move from reactive to proactive monitoring. Focus on alerts that indicate real business impact rather than just technical metrics.