Health Checks - Tensor One

Monitor the health and readiness of your serverless endpoints with comprehensive health checks. This endpoint provides real-time status information about endpoint availability, resource health, and system readiness for processing requests.

Path Parameters

endpointId: The unique identifier of the endpoint to check health for

Query Parameters

check: Type of health check (basic, detailed, deep) - defaults to basic
timeout: Maximum time to wait for health check in seconds (1-30) - defaults to 10
include: Additional health metrics to include (dependencies, resources, connectivity)

Example Usage

Basic Health Check

curl -X GET "https://api.tensorone.ai/v2/endpoints/ep_1234567890abcdef/health" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json"

Detailed Health Check with Resources

curl -X GET "https://api.tensorone.ai/v2/endpoints/ep_1234567890abcdef/health?check=detailed&include=resources,dependencies" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json"

Deep Health Check with Full Diagnostics

curl -X GET "https://api.tensorone.ai/v2/endpoints/ep_1234567890abcdef/health?check=deep&include=resources,dependencies,connectivity&timeout=30" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json"

Batch Health Check for Multiple Endpoints

curl -X POST "https://api.tensorone.ai/v2/endpoints/health/batch" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "endpointIds": [
      "ep_1234567890abcdef",
      "ep_2345678901bcdefg",
      "ep_3456789012cdefgh"
    ],
    "check": "detailed",
    "include": ["resources"]
  }'

Response

Healthy Endpoint Response

{
    "endpointId": "ep_1234567890abcdef",
    "status": "healthy",
    "readiness": "ready",
    "lastChecked": "2024-01-15T14:35:22Z",
    "responseTime": 87,
    "uptime": "72h 15m 30s",
    "version": "1.2.3",
    "checks": {
        "api": {
            "status": "healthy",
            "responseTime": 45,
            "lastError": null
        },
        "model": {
            "status": "healthy",
            "loadTime": 12.5,
            "memoryUsage": "8.2GB",
            "lastInference": "2024-01-15T14:34:15Z"
        },
        "dependencies": {
            "status": "healthy",
            "services": [
                {
                    "name": "model_storage",
                    "status": "healthy",
                    "responseTime": 23
                },
                {
                    "name": "result_cache",
                    "status": "healthy",
                    "responseTime": 12
                }
            ]
        }
    },
    "resources": {
        "gpu": {
            "status": "healthy",
            "utilization": 15,
            "memory": {
                "used": "2.1GB",
                "total": "40GB",
                "usage": 5.25
            },
            "temperature": 45,
            "errors": []
        },
        "cpu": {
            "status": "healthy",
            "utilization": 8,
            "cores": 16,
            "loadAverage": [0.5, 0.3, 0.2]
        },
        "memory": {
            "status": "healthy",
            "used": "12.8GB",
            "total": "64GB",
            "usage": 20
        },
        "storage": {
            "status": "healthy",
            "used": "45GB",
            "total": "500GB",
            "usage": 9,
            "iops": 150
        }
    },
    "metrics": {
        "requestsLastHour": 247,
        "averageLatency": 1.8,
        "errorRate": 0.2,
        "successRate": 99.8
    }
}

Unhealthy Endpoint Response

{
    "endpointId": "ep_unhealthy_example",
    "status": "unhealthy",
    "readiness": "not_ready",
    "lastChecked": "2024-01-15T14:35:22Z",
    "responseTime": 5000,
    "uptime": "2h 45m 12s",
    "version": "1.2.3",
    "issues": [
        {
            "type": "resource_constraint",
            "severity": "high",
            "component": "gpu_memory",
            "message": "GPU memory usage at 98%, may cause OOM errors",
            "timestamp": "2024-01-15T14:33:45Z",
            "recommendation": "Reduce batch size or scale up to larger GPU"
        },
        {
            "type": "dependency_failure",
            "severity": "medium",
            "component": "model_storage",
            "message": "Model storage service responding slowly",
            "timestamp": "2024-01-15T14:32:10Z",
            "recommendation": "Check storage service health"
        }
    ],
    "checks": {
        "api": {
            "status": "healthy",
            "responseTime": 125,
            "lastError": null
        },
        "model": {
            "status": "degraded",
            "loadTime": 12.5,
            "memoryUsage": "39.2GB",
            "lastInference": "2024-01-15T14:34:15Z",
            "warnings": ["High memory usage approaching limit"]
        },
        "dependencies": {
            "status": "degraded",
            "services": [
                {
                    "name": "model_storage",
                    "status": "degraded",
                    "responseTime": 1250,
                    "error": "High latency detected"
                },
                {
                    "name": "result_cache",
                    "status": "healthy",
                    "responseTime": 45
                }
            ]
        }
    },
    "resources": {
        "gpu": {
            "status": "warning",
            "utilization": 95,
            "memory": {
                "used": "39.2GB",
                "total": "40GB",
                "usage": 98
            },
            "temperature": 82,
            "errors": [],
            "warnings": ["High memory usage", "Elevated temperature"]
        }
    },
    "recoveryActions": [
        {
            "action": "restart_endpoint",
            "description": "Restart endpoint to clear memory leaks",
            "estimated_downtime": "2-3 minutes"
        },
        {
            "action": "scale_resources",
            "description": "Scale to higher memory GPU",
            "estimated_cost_increase": "20%"
        }
    ]
}

Endpoint Starting Response

{
    "endpointId": "ep_starting_example",
    "status": "starting",
    "readiness": "not_ready",
    "lastChecked": "2024-01-15T14:35:22Z",
    "responseTime": 0,
    "uptime": "0s",
    "version": "1.2.3",
    "startupProgress": {
        "phase": "loading_model",
        "progress": 65,
        "estimatedCompletion": "2024-01-15T14:37:00Z",
        "phases": [
            {
                "name": "container_startup",
                "status": "completed",
                "duration": 15.2
            },
            {
                "name": "dependency_loading",
                "status": "completed",
                "duration": 8.7
            },
            {
                "name": "loading_model",
                "status": "in_progress",
                "progress": 65,
                "estimatedRemaining": 45.3
            },
            {
                "name": "warmup_inference",
                "status": "pending",
                "estimatedDuration": 12.0
            }
        ]
    },
    "resources": {
        "gpu": {
            "status": "initializing",
            "utilization": 45,
            "memory": {
                "used": "8.2GB",
                "total": "40GB",
                "usage": 20.5
            }
        }
    }
}

Health Status Values

Overall Status

healthy: Endpoint is fully operational and ready to serve requests
degraded: Endpoint is operational but experiencing issues that may affect performance
unhealthy: Endpoint has critical issues and may not process requests reliably
starting: Endpoint is starting up and not yet ready
stopped: Endpoint is intentionally stopped
error: Endpoint is in an error state and requires intervention

Readiness Status

ready: Endpoint can accept and process requests immediately
not_ready: Endpoint cannot process requests (starting, errors, resource issues)
warming_up: Endpoint is ready but may have increased latency due to cold start

Component Status

healthy: Component is operating normally
degraded: Component is functional but with reduced performance
unhealthy: Component has critical issues
failed: Component is not functioning
unknown: Component status cannot be determined

Health Check Types

Basic Health Check

API endpoint responsiveness
Basic resource availability
Service uptime

Detailed Health Check

All basic checks plus:
Resource utilization metrics
Dependency service status
Performance metrics
Recent error rates

Deep Health Check

All detailed checks plus:
Full system diagnostics
Connectivity tests to all dependencies
Model inference test
Storage and network I/O tests

Readiness Probes

Kubernetes-Style Readiness

# Readiness probe endpoint for orchestration platforms
curl -X GET "https://api.tensorone.ai/v2/endpoints/ep_1234567890abcdef/ready" \
  -H "Authorization: Bearer YOUR_API_KEY"

Readiness Response

{
    "ready": true,
    "checks": [
        {
            "name": "model_loaded",
            "status": "pass"
        },
        {
            "name": "resources_available",
            "status": "pass"
        },
        {
            "name": "dependencies_healthy",
            "status": "pass"
        }
    ],
    "readinessGates": {
        "model": true,
        "resources": true,
        "dependencies": true,
        "networking": true
    }
}

Liveness Probes

Basic Liveness Check

# Simple liveness probe
curl -X GET "https://api.tensorone.ai/v2/endpoints/ep_1234567890abcdef/live" \
  -H "Authorization: Bearer YOUR_API_KEY"

Liveness Response

{
    "alive": true,
    "timestamp": "2024-01-15T14:35:22Z",
    "uptime": "72h 15m 30s",
    "version": "1.2.3"
}

Resource Health Monitoring

GPU Health Details

{
    "gpu": {
        "devices": [
            {
                "id": "gpu_0",
                "model": "NVIDIA A100-SXM4-40GB",
                "status": "healthy",
                "utilization": 25,
                "memory": {
                    "used": "8.2GB",
                    "total": "40GB",
                    "usage": 20.5
                },
                "temperature": 52,
                "power": {
                    "current": "180W",
                    "max": "400W",
                    "usage": 45
                },
                "errors": [],
                "warnings": [],
                "lastMaintenance": "2024-01-10T09:00:00Z"
            }
        ],
        "driver": {
            "version": "525.147.05",
            "status": "healthy"
        },
        "cuda": {
            "version": "12.2",
            "status": "healthy"
        }
    }
}

Storage Health Details

{
    "storage": {
        "volumes": [
            {
                "mount": "/models",
                "type": "ssd",
                "size": "500GB",
                "used": "45GB",
                "available": "455GB",
                "usage": 9,
                "iops": {
                    "read": 850,
                    "write": 450
                },
                "latency": {
                    "read": 0.8,
                    "write": 1.2
                },
                "status": "healthy"
            }
        ],
        "cache": {
            "size": "50GB",
            "used": "12GB",
            "hitRate": 94.5,
            "status": "healthy"
        }
    }
}

Error Handling

404 Endpoint Not Found

{
    "error": "ENDPOINT_NOT_FOUND",
    "message": "Endpoint ep_invalid does not exist",
    "details": {
        "endpointId": "ep_invalid",
        "suggestion": "Check endpoint ID or verify endpoint exists"
    }
}

503 Service Unavailable

{
    "error": "HEALTH_CHECK_FAILED",
    "message": "Health check service temporarily unavailable",
    "details": {
        "reason": "Health monitoring system overloaded",
        "retryAfter": 30,
        "fallbackStatus": "unknown"
    }
}

408 Timeout

{
    "error": "HEALTH_CHECK_TIMEOUT",
    "message": "Health check timed out after 30 seconds",
    "details": {
        "timeout": 30,
        "partialResults": {
            "api": "healthy",
            "model": "timeout",
            "dependencies": "unknown"
        },
        "recommendation": "Increase timeout or check endpoint performance"
    }
}

SDK Examples

Python SDK

from tensorone import TensorOneClient
import time
import asyncio
from datetime import datetime, timedelta

client = TensorOneClient(api_key="your_api_key")

# Basic health check
def check_endpoint_health(endpoint_id):
    health = client.endpoints.get_health(endpoint_id)
    print(f"Endpoint {endpoint_id}: {health.status}")
    
    if health.status != "healthy":
        print("Issues found:")
        for issue in health.issues:
            print(f"  - {issue.severity}: {issue.message}")
    
    return health

# Detailed health monitoring
def detailed_health_check(endpoint_id):
    health = client.endpoints.get_health(
        endpoint_id,
        check="detailed",
        include=["resources", "dependencies", "connectivity"]
    )
    
    print(f"Endpoint Health Report for {endpoint_id}")
    print(f"Status: {health.status}")
    print(f"Ready: {health.readiness}")
    print(f"Uptime: {health.uptime}")
    print(f"Response Time: {health.response_time}ms")
    
    # Resource health
    if health.resources:
        gpu = health.resources.gpu
        print(f"GPU: {gpu.utilization}% utilized, {gpu.memory.usage}% memory")
        
        if gpu.warnings:
            print("GPU Warnings:", ", ".join(gpu.warnings))
    
    # Dependency health
    if health.checks.dependencies:
        print("Dependencies:")
        for service in health.checks.dependencies.services:
            print(f"  {service.name}: {service.status} ({service.response_time}ms)")
    
    return health

# Continuous health monitoring
async def monitor_endpoint_health(endpoint_id, interval=60):
    """Monitor endpoint health continuously"""
    while True:
        try:
            health = client.endpoints.get_health(
                endpoint_id,
                check="detailed",
                include=["resources"]
            )
            
            timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
            print(f"[{timestamp}] {endpoint_id}: {health.status}")
            
            # Alert on issues
            if health.status in ["unhealthy", "degraded"]:
                print("⚠️  Issues detected:")
                for issue in health.issues:
                    print(f"   {issue.severity}: {issue.message}")
                    
                # Check if restart is recommended
                if any(action.action == "restart_endpoint" for action in health.recovery_actions):
                    print("💡 Consider restarting endpoint")
            
            # Monitor resource usage
            if health.resources and health.resources.gpu:
                gpu = health.resources.gpu
                if gpu.memory.usage > 90:
                    print(f"🚨 High GPU memory usage: {gpu.memory.usage}%")
                if gpu.temperature > 80:
                    print(f"🌡️  High GPU temperature: {gpu.temperature}°C")
            
        except Exception as e:
            print(f"Health check failed: {e}")
        
        await asyncio.sleep(interval)

# Batch health checking
def check_multiple_endpoints(endpoint_ids):
    health_results = client.endpoints.get_batch_health(
        endpoint_ids=endpoint_ids,
        check="detailed",
        include=["resources"]
    )
    
    healthy_count = 0
    degraded_count = 0
    unhealthy_count = 0
    
    for health in health_results:
        if health.status == "healthy":
            healthy_count += 1
        elif health.status == "degraded":
            degraded_count += 1
        else:
            unhealthy_count += 1
    
    print(f"Health Summary:")
    print(f"  Healthy: {healthy_count}")
    print(f"  Degraded: {degraded_count}")
    print(f"  Unhealthy: {unhealthy_count}")
    
    return health_results

# Readiness waiting
def wait_for_endpoint_ready(endpoint_id, timeout=300):
    """Wait for endpoint to become ready with timeout"""
    start_time = time.time()
    
    while time.time() - start_time < timeout:
        health = client.endpoints.get_health(endpoint_id)
        
        if health.readiness == "ready":
            print(f"Endpoint {endpoint_id} is ready!")
            return True
        elif health.status == "starting":
            progress = health.startup_progress
            if progress:
                print(f"Starting up: {progress.phase} ({progress.progress}%)")
        else:
            print(f"Current status: {health.status}")
        
        time.sleep(5)
    
    print(f"Timeout waiting for endpoint {endpoint_id} to become ready")
    return False

# Usage examples
if __name__ == "__main__":
    endpoint_id = "ep_1234567890abcdef"
    
    # Basic health check
    health = check_endpoint_health(endpoint_id)
    
    # Detailed health check
    detailed_health_check(endpoint_id)
    
    # Wait for readiness
    if wait_for_endpoint_ready(endpoint_id):
        print("Endpoint is ready for requests!")
    
    # Monitor multiple endpoints
    endpoints = ["ep_1234567890abcdef", "ep_2345678901bcdefg"]
    check_multiple_endpoints(endpoints)
    
    # Start continuous monitoring (uncomment to run)
    # asyncio.run(monitor_endpoint_health(endpoint_id))

JavaScript SDK

import { TensorOneClient } from "@tensorone/sdk";

const client = new TensorOneClient({ apiKey: "your_api_key" });

// Basic health check
async function checkEndpointHealth(endpointId) {
    const health = await client.endpoints.getHealth(endpointId);
    console.log(`Endpoint ${endpointId}: ${health.status}`);
    
    if (health.status !== "healthy") {
        console.log("Issues found:");
        health.issues?.forEach(issue => {
            console.log(`  - ${issue.severity}: ${issue.message}`);
        });
    }
    
    return health;
}

// Detailed health monitoring with real-time updates
async function monitorEndpointHealth(endpointId, options = {}) {
    const { interval = 60000, alertThresholds = {} } = options;
    
    const monitor = setInterval(async () => {
        try {
            const health = await client.endpoints.getHealth(endpointId, {
                check: "detailed",
                include: ["resources", "dependencies"]
            });
            
            const timestamp = new Date().toISOString();
            console.log(`[${timestamp}] ${endpointId}: ${health.status}`);
            
            // Resource monitoring with alerts
            if (health.resources?.gpu) {
                const gpu = health.resources.gpu;
                const memoryThreshold = alertThresholds.gpuMemory || 90;
                const tempThreshold = alertThresholds.gpuTemp || 80;
                
                if (gpu.memory.usage > memoryThreshold) {
                    console.warn(`🚨 High GPU memory: ${gpu.memory.usage}%`);
                }
                
                if (gpu.temperature > tempThreshold) {
                    console.warn(`🌡️ High GPU temp: ${gpu.temperature}°C`);
                }
            }
            
            // Performance monitoring
            if (health.metrics) {
                const errorRate = health.metrics.errorRate;
                const latency = health.metrics.averageLatency;
                
                if (errorRate > 5) {
                    console.warn(`📊 High error rate: ${errorRate}%`);
                }
                
                if (latency > 10) {
                    console.warn(`⏱️ High latency: ${latency}s`);
                }
            }
            
            // Dependency monitoring
            if (health.checks?.dependencies?.services) {
                const unhealthyDeps = health.checks.dependencies.services
                    .filter(service => service.status !== "healthy");
                
                if (unhealthyDeps.length > 0) {
                    console.warn("🔗 Unhealthy dependencies:");
                    unhealthyDeps.forEach(dep => {
                        console.warn(`   ${dep.name}: ${dep.status}`);
                    });
                }
            }
            
        } catch (error) {
            console.error(`Health check failed for ${endpointId}:`, error);
        }
    }, interval);
    
    return () => clearInterval(monitor);
}

// Readiness polling with async/await
async function waitForEndpointReady(endpointId, options = {}) {
    const { timeout = 300000, pollInterval = 5000 } = options;
    const startTime = Date.now();
    
    while (Date.now() - startTime < timeout) {
        try {
            const health = await client.endpoints.getHealth(endpointId);
            
            if (health.readiness === "ready") {
                console.log(`✅ Endpoint ${endpointId} is ready!`);
                return true;
            }
            
            if (health.status === "starting" && health.startupProgress) {
                const progress = health.startupProgress;
                console.log(`⏳ Starting: ${progress.phase} (${progress.progress}%)`);
            } else {
                console.log(`📊 Status: ${health.status}, Readiness: ${health.readiness}`);
            }
            
            await new Promise(resolve => setTimeout(resolve, pollInterval));
            
        } catch (error) {
            console.error(`Error checking readiness:`, error);
            await new Promise(resolve => setTimeout(resolve, pollInterval));
        }
    }
    
    console.error(`❌ Timeout waiting for ${endpointId} to become ready`);
    return false;
}

// Batch health checking with Promise.all
async function checkMultipleEndpoints(endpointIds, options = {}) {
    const { check = "basic", include = [] } = options;
    
    try {
        const healthChecks = await Promise.allSettled(
            endpointIds.map(id => 
                client.endpoints.getHealth(id, { check, include })
            )
        );
        
        const results = {
            healthy: 0,
            degraded: 0,
            unhealthy: 0,
            errors: 0
        };
        
        healthChecks.forEach((result, index) => {
            const endpointId = endpointIds[index];
            
            if (result.status === "fulfilled") {
                const health = result.value;
                results[health.status] = (results[health.status] || 0) + 1;
                
                console.log(`${endpointId}: ${health.status}`);
                
                if (health.status !== "healthy") {
                    health.issues?.forEach(issue => {
                        console.log(`  ⚠️ ${issue.severity}: ${issue.message}`);
                    });
                }
            } else {
                results.errors++;
                console.error(`${endpointId}: Error - ${result.reason}`);
            }
        });
        
        console.log("\n📊 Health Summary:");
        Object.entries(results).forEach(([status, count]) => {
            if (count > 0) {
                console.log(`  ${status}: ${count}`);
            }
        });
        
        return healthChecks;
        
    } catch (error) {
        console.error("Batch health check failed:", error);
        throw error;
    }
}

// Health-based auto-scaling trigger
async function autoScaleBasedOnHealth(endpointIds, options = {}) {
    const { 
        scaleUpThreshold = 80,  // GPU utilization %
        scaleDownThreshold = 20,
        minInstances = 1,
        maxInstances = 10
    } = options;
    
    const healthResults = await Promise.all(
        endpointIds.map(id => client.endpoints.getHealth(id, {
            check: "detailed",
            include: ["resources"]
        }))
    );
    
    let scaleRecommendations = [];
    
    healthResults.forEach(health => {
        if (!health.resources?.gpu) return;
        
        const gpuUtil = health.resources.gpu.utilization;
        const memUtil = health.resources.gpu.memory.usage;
        
        if (gpuUtil > scaleUpThreshold || memUtil > 90) {
            scaleRecommendations.push({
                endpointId: health.endpointId,
                action: "scale_up",
                reason: `High utilization: GPU ${gpuUtil}%, Memory ${memUtil}%`,
                priority: memUtil > 95 ? "high" : "medium"
            });
        } else if (gpuUtil < scaleDownThreshold && memUtil < 30) {
            scaleRecommendations.push({
                endpointId: health.endpointId,
                action: "scale_down",
                reason: `Low utilization: GPU ${gpuUtil}%, Memory ${memUtil}%`,
                priority: "low"
            });
        }
    });
    
    return scaleRecommendations;
}

// Usage examples
async function main() {
    const endpointId = "ep_1234567890abcdef";
    const endpointIds = ["ep_1234567890abcdef", "ep_2345678901bcdefg"];
    
    try {
        // Basic health check
        await checkEndpointHealth(endpointId);
        
        // Wait for readiness
        const isReady = await waitForEndpointReady(endpointId);
        if (isReady) {
            console.log("Endpoint is ready for requests!");
        }
        
        // Check multiple endpoints
        await checkMultipleEndpoints(endpointIds, {
            check: "detailed",
            include: ["resources"]
        });
        
        // Auto-scaling recommendations
        const scaleRecs = await autoScaleBasedOnHealth(endpointIds);
        if (scaleRecs.length > 0) {
            console.log("Scaling recommendations:");
            scaleRecs.forEach(rec => {
                console.log(`  ${rec.endpointId}: ${rec.action} - ${rec.reason}`);
            });
        }
        
        // Start continuous monitoring (uncomment to run)
        // const stopMonitoring = await monitorEndpointHealth(endpointId, {
        //     interval: 30000,
        //     alertThresholds: { gpuMemory: 85, gpuTemp: 75 }
        // });
        
        // Stop monitoring after 5 minutes
        // setTimeout(stopMonitoring, 5 * 60 * 1000);
        
    } catch (error) {
        console.error("Health monitoring error:", error);
    }
}

main();

Use Cases

Production Monitoring

Service Reliability: Monitor endpoint health in production environments
Automated Alerts: Set up alerting based on health status changes
Load Balancing: Route traffic away from unhealthy endpoints
Capacity Planning: Monitor resource utilization trends

CI/CD Integration

Deployment Validation: Verify endpoint health after deployments
Rollback Triggers: Automatically rollback on health failures
Readiness Gates: Wait for endpoints to be ready before promoting traffic
Health-based Testing: Run tests only when endpoints are healthy

Auto-scaling and Orchestration

Kubernetes Integration: Use as readiness and liveness probes
Auto-scaling Triggers: Scale based on resource health metrics
Failover Systems: Detect failures and switch to backup endpoints
Maintenance Windows: Schedule maintenance based on health patterns

Development and Debugging

Performance Optimization: Identify performance bottlenecks
Resource Monitoring: Track resource usage during development
Dependency Validation: Ensure all dependencies are healthy
Cold Start Analysis: Monitor startup performance and optimization

Best Practices

Health Check Strategy

Regular Monitoring: Implement regular health checks with appropriate intervals
Graduated Checks: Use basic checks for frequent monitoring, detailed for diagnostics
Timeout Management: Set appropriate timeouts based on expected response times
Error Handling: Implement graceful handling of health check failures

Performance Considerations

Check Frequency: Balance monitoring frequency with system load
Batch Operations: Use batch health checks for multiple endpoints
Caching: Cache health results for non-critical monitoring
Selective Inclusion: Only request detailed metrics when needed

Alert Configuration

Threshold Setting: Set appropriate thresholds for different severity levels
Alert Fatigue: Prevent alert fatigue with intelligent alerting
Escalation Paths: Define clear escalation procedures for different issue types
Recovery Actions: Implement automated recovery actions where appropriate

Integration Patterns

Circuit Breakers: Use health status to trigger circuit breaker patterns
Service Mesh: Integrate with service mesh health checking
Monitoring Tools: Export health metrics to monitoring and observability tools
Documentation: Document health check interpretations and response procedures

Health checks are cached for 30 seconds to reduce system load. For real-time status updates, use the streaming endpoints or webhook notifications.

Deep health checks consume more resources and should be used sparingly in production. Use basic or detailed checks for regular monitoring.

Set up automated recovery actions based on health status to reduce manual intervention and improve system reliability. Consider implementing circuit breaker patterns for improved resilience.

Authorizations

Authorization

string

header

required

API key authentication. Use 'Bearer YOUR_API_KEY' format.

Path Parameters

endpointId

string

required

Response

200 - application/json

Endpoint health status

The response is of type object.

Getting Started

Account Management

GPU Clusters (VPS)

Serverless Endpoints

Managed Training

AI Services

Payment & Billing

Monitoring & Analytics

​Path Parameters

​Query Parameters

​Example Usage

​Basic Health Check

​Detailed Health Check with Resources

​Deep Health Check with Full Diagnostics

​Batch Health Check for Multiple Endpoints

​Response

​Healthy Endpoint Response

​Unhealthy Endpoint Response

​Endpoint Starting Response

​Health Status Values

​Overall Status

​Readiness Status

​Component Status

​Health Check Types

​Basic Health Check

​Detailed Health Check

​Deep Health Check

​Readiness Probes

​Kubernetes-Style Readiness

​Readiness Response

​Liveness Probes

​Basic Liveness Check

​Liveness Response

​Resource Health Monitoring

​GPU Health Details

​Storage Health Details

​Error Handling

​404 Endpoint Not Found

​503 Service Unavailable

​408 Timeout

​SDK Examples

​Python SDK

​JavaScript SDK

​Use Cases

​Production Monitoring

​CI/CD Integration

​Auto-scaling and Orchestration

​Development and Debugging

​Best Practices

​Health Check Strategy

​Performance Considerations

​Alert Configuration

​Integration Patterns

Authorizations

Path Parameters

Response

Path Parameters

Query Parameters

Example Usage

Basic Health Check

Detailed Health Check with Resources

Deep Health Check with Full Diagnostics

Batch Health Check for Multiple Endpoints

Response

Healthy Endpoint Response

Unhealthy Endpoint Response

Endpoint Starting Response

Health Status Values

Overall Status

Readiness Status

Component Status

Health Check Types

Basic Health Check

Detailed Health Check

Deep Health Check

Readiness Probes

Kubernetes-Style Readiness

Readiness Response

Liveness Probes

Basic Liveness Check

Liveness Response

Resource Health Monitoring

GPU Health Details

Storage Health Details

Error Handling

404 Endpoint Not Found

503 Service Unavailable

408 Timeout

SDK Examples

Python SDK

JavaScript SDK

Use Cases

Production Monitoring

CI/CD Integration

Auto-scaling and Orchestration

Development and Debugging

Best Practices

Health Check Strategy

Performance Considerations

Alert Configuration

Integration Patterns