Monitor the health and readiness of your serverless endpoints with comprehensive health checks. This endpoint provides real-time status information about endpoint availability, resource health, and system readiness for processing requests.
Path Parameters
endpointId
: The unique identifier of the endpoint to check health for
Query Parameters
check
: Type of health check (basic
, detailed
, deep
) - defaults to basic
timeout
: Maximum time to wait for health check in seconds (1-30) - defaults to 10
include
: Additional health metrics to include (dependencies
, resources
, connectivity
)
Example Usage
Basic Health Check
curl -X GET "https://api.tensorone.ai/v2/endpoints/ep_1234567890abcdef/health" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json"
Detailed Health Check with Resources
curl -X GET "https://api.tensorone.ai/v2/endpoints/ep_1234567890abcdef/health?check=detailed&include=resources,dependencies" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json"
Deep Health Check with Full Diagnostics
curl -X GET "https://api.tensorone.ai/v2/endpoints/ep_1234567890abcdef/health?check=deep&include=resources,dependencies,connectivity&timeout=30" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json"
Batch Health Check for Multiple Endpoints
curl -X POST "https://api.tensorone.ai/v2/endpoints/health/batch" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"endpointIds": [
"ep_1234567890abcdef",
"ep_2345678901bcdefg",
"ep_3456789012cdefgh"
],
"check": "detailed",
"include": ["resources"]
}'
Response
Healthy Endpoint Response
{
"endpointId": "ep_1234567890abcdef",
"status": "healthy",
"readiness": "ready",
"lastChecked": "2024-01-15T14:35:22Z",
"responseTime": 87,
"uptime": "72h 15m 30s",
"version": "1.2.3",
"checks": {
"api": {
"status": "healthy",
"responseTime": 45,
"lastError": null
},
"model": {
"status": "healthy",
"loadTime": 12.5,
"memoryUsage": "8.2GB",
"lastInference": "2024-01-15T14:34:15Z"
},
"dependencies": {
"status": "healthy",
"services": [
{
"name": "model_storage",
"status": "healthy",
"responseTime": 23
},
{
"name": "result_cache",
"status": "healthy",
"responseTime": 12
}
]
}
},
"resources": {
"gpu": {
"status": "healthy",
"utilization": 15,
"memory": {
"used": "2.1GB",
"total": "40GB",
"usage": 5.25
},
"temperature": 45,
"errors": []
},
"cpu": {
"status": "healthy",
"utilization": 8,
"cores": 16,
"loadAverage": [0.5, 0.3, 0.2]
},
"memory": {
"status": "healthy",
"used": "12.8GB",
"total": "64GB",
"usage": 20
},
"storage": {
"status": "healthy",
"used": "45GB",
"total": "500GB",
"usage": 9,
"iops": 150
}
},
"metrics": {
"requestsLastHour": 247,
"averageLatency": 1.8,
"errorRate": 0.2,
"successRate": 99.8
}
}
Unhealthy Endpoint Response
{
"endpointId": "ep_unhealthy_example",
"status": "unhealthy",
"readiness": "not_ready",
"lastChecked": "2024-01-15T14:35:22Z",
"responseTime": 5000,
"uptime": "2h 45m 12s",
"version": "1.2.3",
"issues": [
{
"type": "resource_constraint",
"severity": "high",
"component": "gpu_memory",
"message": "GPU memory usage at 98%, may cause OOM errors",
"timestamp": "2024-01-15T14:33:45Z",
"recommendation": "Reduce batch size or scale up to larger GPU"
},
{
"type": "dependency_failure",
"severity": "medium",
"component": "model_storage",
"message": "Model storage service responding slowly",
"timestamp": "2024-01-15T14:32:10Z",
"recommendation": "Check storage service health"
}
],
"checks": {
"api": {
"status": "healthy",
"responseTime": 125,
"lastError": null
},
"model": {
"status": "degraded",
"loadTime": 12.5,
"memoryUsage": "39.2GB",
"lastInference": "2024-01-15T14:34:15Z",
"warnings": ["High memory usage approaching limit"]
},
"dependencies": {
"status": "degraded",
"services": [
{
"name": "model_storage",
"status": "degraded",
"responseTime": 1250,
"error": "High latency detected"
},
{
"name": "result_cache",
"status": "healthy",
"responseTime": 45
}
]
}
},
"resources": {
"gpu": {
"status": "warning",
"utilization": 95,
"memory": {
"used": "39.2GB",
"total": "40GB",
"usage": 98
},
"temperature": 82,
"errors": [],
"warnings": ["High memory usage", "Elevated temperature"]
}
},
"recoveryActions": [
{
"action": "restart_endpoint",
"description": "Restart endpoint to clear memory leaks",
"estimated_downtime": "2-3 minutes"
},
{
"action": "scale_resources",
"description": "Scale to higher memory GPU",
"estimated_cost_increase": "20%"
}
]
}
Endpoint Starting Response
{
"endpointId": "ep_starting_example",
"status": "starting",
"readiness": "not_ready",
"lastChecked": "2024-01-15T14:35:22Z",
"responseTime": 0,
"uptime": "0s",
"version": "1.2.3",
"startupProgress": {
"phase": "loading_model",
"progress": 65,
"estimatedCompletion": "2024-01-15T14:37:00Z",
"phases": [
{
"name": "container_startup",
"status": "completed",
"duration": 15.2
},
{
"name": "dependency_loading",
"status": "completed",
"duration": 8.7
},
{
"name": "loading_model",
"status": "in_progress",
"progress": 65,
"estimatedRemaining": 45.3
},
{
"name": "warmup_inference",
"status": "pending",
"estimatedDuration": 12.0
}
]
},
"resources": {
"gpu": {
"status": "initializing",
"utilization": 45,
"memory": {
"used": "8.2GB",
"total": "40GB",
"usage": 20.5
}
}
}
}
Health Status Values
Overall Status
healthy
: Endpoint is fully operational and ready to serve requests
degraded
: Endpoint is operational but experiencing issues that may affect performance
unhealthy
: Endpoint has critical issues and may not process requests reliably
starting
: Endpoint is starting up and not yet ready
stopped
: Endpoint is intentionally stopped
error
: Endpoint is in an error state and requires intervention
Readiness Status
ready
: Endpoint can accept and process requests immediately
not_ready
: Endpoint cannot process requests (starting, errors, resource issues)
warming_up
: Endpoint is ready but may have increased latency due to cold start
Component Status
healthy
: Component is operating normally
degraded
: Component is functional but with reduced performance
unhealthy
: Component has critical issues
failed
: Component is not functioning
unknown
: Component status cannot be determined
Health Check Types
Basic Health Check
- API endpoint responsiveness
- Basic resource availability
- Service uptime
Detailed Health Check
- All basic checks plus:
- Resource utilization metrics
- Dependency service status
- Performance metrics
- Recent error rates
Deep Health Check
- All detailed checks plus:
- Full system diagnostics
- Connectivity tests to all dependencies
- Model inference test
- Storage and network I/O tests
Readiness Probes
Kubernetes-Style Readiness
# Readiness probe endpoint for orchestration platforms
curl -X GET "https://api.tensorone.ai/v2/endpoints/ep_1234567890abcdef/ready" \
-H "Authorization: Bearer YOUR_API_KEY"
Readiness Response
{
"ready": true,
"checks": [
{
"name": "model_loaded",
"status": "pass"
},
{
"name": "resources_available",
"status": "pass"
},
{
"name": "dependencies_healthy",
"status": "pass"
}
],
"readinessGates": {
"model": true,
"resources": true,
"dependencies": true,
"networking": true
}
}
Liveness Probes
Basic Liveness Check
# Simple liveness probe
curl -X GET "https://api.tensorone.ai/v2/endpoints/ep_1234567890abcdef/live" \
-H "Authorization: Bearer YOUR_API_KEY"
Liveness Response
{
"alive": true,
"timestamp": "2024-01-15T14:35:22Z",
"uptime": "72h 15m 30s",
"version": "1.2.3"
}
Resource Health Monitoring
GPU Health Details
{
"gpu": {
"devices": [
{
"id": "gpu_0",
"model": "NVIDIA A100-SXM4-40GB",
"status": "healthy",
"utilization": 25,
"memory": {
"used": "8.2GB",
"total": "40GB",
"usage": 20.5
},
"temperature": 52,
"power": {
"current": "180W",
"max": "400W",
"usage": 45
},
"errors": [],
"warnings": [],
"lastMaintenance": "2024-01-10T09:00:00Z"
}
],
"driver": {
"version": "525.147.05",
"status": "healthy"
},
"cuda": {
"version": "12.2",
"status": "healthy"
}
}
}
Storage Health Details
{
"storage": {
"volumes": [
{
"mount": "/models",
"type": "ssd",
"size": "500GB",
"used": "45GB",
"available": "455GB",
"usage": 9,
"iops": {
"read": 850,
"write": 450
},
"latency": {
"read": 0.8,
"write": 1.2
},
"status": "healthy"
}
],
"cache": {
"size": "50GB",
"used": "12GB",
"hitRate": 94.5,
"status": "healthy"
}
}
}
Error Handling
404 Endpoint Not Found
{
"error": "ENDPOINT_NOT_FOUND",
"message": "Endpoint ep_invalid does not exist",
"details": {
"endpointId": "ep_invalid",
"suggestion": "Check endpoint ID or verify endpoint exists"
}
}
503 Service Unavailable
{
"error": "HEALTH_CHECK_FAILED",
"message": "Health check service temporarily unavailable",
"details": {
"reason": "Health monitoring system overloaded",
"retryAfter": 30,
"fallbackStatus": "unknown"
}
}
408 Timeout
{
"error": "HEALTH_CHECK_TIMEOUT",
"message": "Health check timed out after 30 seconds",
"details": {
"timeout": 30,
"partialResults": {
"api": "healthy",
"model": "timeout",
"dependencies": "unknown"
},
"recommendation": "Increase timeout or check endpoint performance"
}
}
SDK Examples
Python SDK
from tensorone import TensorOneClient
import time
import asyncio
from datetime import datetime, timedelta
client = TensorOneClient(api_key="your_api_key")
# Basic health check
def check_endpoint_health(endpoint_id):
health = client.endpoints.get_health(endpoint_id)
print(f"Endpoint {endpoint_id}: {health.status}")
if health.status != "healthy":
print("Issues found:")
for issue in health.issues:
print(f" - {issue.severity}: {issue.message}")
return health
# Detailed health monitoring
def detailed_health_check(endpoint_id):
health = client.endpoints.get_health(
endpoint_id,
check="detailed",
include=["resources", "dependencies", "connectivity"]
)
print(f"Endpoint Health Report for {endpoint_id}")
print(f"Status: {health.status}")
print(f"Ready: {health.readiness}")
print(f"Uptime: {health.uptime}")
print(f"Response Time: {health.response_time}ms")
# Resource health
if health.resources:
gpu = health.resources.gpu
print(f"GPU: {gpu.utilization}% utilized, {gpu.memory.usage}% memory")
if gpu.warnings:
print("GPU Warnings:", ", ".join(gpu.warnings))
# Dependency health
if health.checks.dependencies:
print("Dependencies:")
for service in health.checks.dependencies.services:
print(f" {service.name}: {service.status} ({service.response_time}ms)")
return health
# Continuous health monitoring
async def monitor_endpoint_health(endpoint_id, interval=60):
"""Monitor endpoint health continuously"""
while True:
try:
health = client.endpoints.get_health(
endpoint_id,
check="detailed",
include=["resources"]
)
timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
print(f"[{timestamp}] {endpoint_id}: {health.status}")
# Alert on issues
if health.status in ["unhealthy", "degraded"]:
print("⚠️ Issues detected:")
for issue in health.issues:
print(f" {issue.severity}: {issue.message}")
# Check if restart is recommended
if any(action.action == "restart_endpoint" for action in health.recovery_actions):
print("💡 Consider restarting endpoint")
# Monitor resource usage
if health.resources and health.resources.gpu:
gpu = health.resources.gpu
if gpu.memory.usage > 90:
print(f"🚨 High GPU memory usage: {gpu.memory.usage}%")
if gpu.temperature > 80:
print(f"🌡️ High GPU temperature: {gpu.temperature}°C")
except Exception as e:
print(f"Health check failed: {e}")
await asyncio.sleep(interval)
# Batch health checking
def check_multiple_endpoints(endpoint_ids):
health_results = client.endpoints.get_batch_health(
endpoint_ids=endpoint_ids,
check="detailed",
include=["resources"]
)
healthy_count = 0
degraded_count = 0
unhealthy_count = 0
for health in health_results:
if health.status == "healthy":
healthy_count += 1
elif health.status == "degraded":
degraded_count += 1
else:
unhealthy_count += 1
print(f"Health Summary:")
print(f" Healthy: {healthy_count}")
print(f" Degraded: {degraded_count}")
print(f" Unhealthy: {unhealthy_count}")
return health_results
# Readiness waiting
def wait_for_endpoint_ready(endpoint_id, timeout=300):
"""Wait for endpoint to become ready with timeout"""
start_time = time.time()
while time.time() - start_time < timeout:
health = client.endpoints.get_health(endpoint_id)
if health.readiness == "ready":
print(f"Endpoint {endpoint_id} is ready!")
return True
elif health.status == "starting":
progress = health.startup_progress
if progress:
print(f"Starting up: {progress.phase} ({progress.progress}%)")
else:
print(f"Current status: {health.status}")
time.sleep(5)
print(f"Timeout waiting for endpoint {endpoint_id} to become ready")
return False
# Usage examples
if __name__ == "__main__":
endpoint_id = "ep_1234567890abcdef"
# Basic health check
health = check_endpoint_health(endpoint_id)
# Detailed health check
detailed_health_check(endpoint_id)
# Wait for readiness
if wait_for_endpoint_ready(endpoint_id):
print("Endpoint is ready for requests!")
# Monitor multiple endpoints
endpoints = ["ep_1234567890abcdef", "ep_2345678901bcdefg"]
check_multiple_endpoints(endpoints)
# Start continuous monitoring (uncomment to run)
# asyncio.run(monitor_endpoint_health(endpoint_id))
JavaScript SDK
import { TensorOneClient } from "@tensorone/sdk";
const client = new TensorOneClient({ apiKey: "your_api_key" });
// Basic health check
async function checkEndpointHealth(endpointId) {
const health = await client.endpoints.getHealth(endpointId);
console.log(`Endpoint ${endpointId}: ${health.status}`);
if (health.status !== "healthy") {
console.log("Issues found:");
health.issues?.forEach(issue => {
console.log(` - ${issue.severity}: ${issue.message}`);
});
}
return health;
}
// Detailed health monitoring with real-time updates
async function monitorEndpointHealth(endpointId, options = {}) {
const { interval = 60000, alertThresholds = {} } = options;
const monitor = setInterval(async () => {
try {
const health = await client.endpoints.getHealth(endpointId, {
check: "detailed",
include: ["resources", "dependencies"]
});
const timestamp = new Date().toISOString();
console.log(`[${timestamp}] ${endpointId}: ${health.status}`);
// Resource monitoring with alerts
if (health.resources?.gpu) {
const gpu = health.resources.gpu;
const memoryThreshold = alertThresholds.gpuMemory || 90;
const tempThreshold = alertThresholds.gpuTemp || 80;
if (gpu.memory.usage > memoryThreshold) {
console.warn(`🚨 High GPU memory: ${gpu.memory.usage}%`);
}
if (gpu.temperature > tempThreshold) {
console.warn(`🌡️ High GPU temp: ${gpu.temperature}°C`);
}
}
// Performance monitoring
if (health.metrics) {
const errorRate = health.metrics.errorRate;
const latency = health.metrics.averageLatency;
if (errorRate > 5) {
console.warn(`📊 High error rate: ${errorRate}%`);
}
if (latency > 10) {
console.warn(`⏱️ High latency: ${latency}s`);
}
}
// Dependency monitoring
if (health.checks?.dependencies?.services) {
const unhealthyDeps = health.checks.dependencies.services
.filter(service => service.status !== "healthy");
if (unhealthyDeps.length > 0) {
console.warn("🔗 Unhealthy dependencies:");
unhealthyDeps.forEach(dep => {
console.warn(` ${dep.name}: ${dep.status}`);
});
}
}
} catch (error) {
console.error(`Health check failed for ${endpointId}:`, error);
}
}, interval);
return () => clearInterval(monitor);
}
// Readiness polling with async/await
async function waitForEndpointReady(endpointId, options = {}) {
const { timeout = 300000, pollInterval = 5000 } = options;
const startTime = Date.now();
while (Date.now() - startTime < timeout) {
try {
const health = await client.endpoints.getHealth(endpointId);
if (health.readiness === "ready") {
console.log(`✅ Endpoint ${endpointId} is ready!`);
return true;
}
if (health.status === "starting" && health.startupProgress) {
const progress = health.startupProgress;
console.log(`⏳ Starting: ${progress.phase} (${progress.progress}%)`);
} else {
console.log(`📊 Status: ${health.status}, Readiness: ${health.readiness}`);
}
await new Promise(resolve => setTimeout(resolve, pollInterval));
} catch (error) {
console.error(`Error checking readiness:`, error);
await new Promise(resolve => setTimeout(resolve, pollInterval));
}
}
console.error(`❌ Timeout waiting for ${endpointId} to become ready`);
return false;
}
// Batch health checking with Promise.all
async function checkMultipleEndpoints(endpointIds, options = {}) {
const { check = "basic", include = [] } = options;
try {
const healthChecks = await Promise.allSettled(
endpointIds.map(id =>
client.endpoints.getHealth(id, { check, include })
)
);
const results = {
healthy: 0,
degraded: 0,
unhealthy: 0,
errors: 0
};
healthChecks.forEach((result, index) => {
const endpointId = endpointIds[index];
if (result.status === "fulfilled") {
const health = result.value;
results[health.status] = (results[health.status] || 0) + 1;
console.log(`${endpointId}: ${health.status}`);
if (health.status !== "healthy") {
health.issues?.forEach(issue => {
console.log(` ⚠️ ${issue.severity}: ${issue.message}`);
});
}
} else {
results.errors++;
console.error(`${endpointId}: Error - ${result.reason}`);
}
});
console.log("\n📊 Health Summary:");
Object.entries(results).forEach(([status, count]) => {
if (count > 0) {
console.log(` ${status}: ${count}`);
}
});
return healthChecks;
} catch (error) {
console.error("Batch health check failed:", error);
throw error;
}
}
// Health-based auto-scaling trigger
async function autoScaleBasedOnHealth(endpointIds, options = {}) {
const {
scaleUpThreshold = 80, // GPU utilization %
scaleDownThreshold = 20,
minInstances = 1,
maxInstances = 10
} = options;
const healthResults = await Promise.all(
endpointIds.map(id => client.endpoints.getHealth(id, {
check: "detailed",
include: ["resources"]
}))
);
let scaleRecommendations = [];
healthResults.forEach(health => {
if (!health.resources?.gpu) return;
const gpuUtil = health.resources.gpu.utilization;
const memUtil = health.resources.gpu.memory.usage;
if (gpuUtil > scaleUpThreshold || memUtil > 90) {
scaleRecommendations.push({
endpointId: health.endpointId,
action: "scale_up",
reason: `High utilization: GPU ${gpuUtil}%, Memory ${memUtil}%`,
priority: memUtil > 95 ? "high" : "medium"
});
} else if (gpuUtil < scaleDownThreshold && memUtil < 30) {
scaleRecommendations.push({
endpointId: health.endpointId,
action: "scale_down",
reason: `Low utilization: GPU ${gpuUtil}%, Memory ${memUtil}%`,
priority: "low"
});
}
});
return scaleRecommendations;
}
// Usage examples
async function main() {
const endpointId = "ep_1234567890abcdef";
const endpointIds = ["ep_1234567890abcdef", "ep_2345678901bcdefg"];
try {
// Basic health check
await checkEndpointHealth(endpointId);
// Wait for readiness
const isReady = await waitForEndpointReady(endpointId);
if (isReady) {
console.log("Endpoint is ready for requests!");
}
// Check multiple endpoints
await checkMultipleEndpoints(endpointIds, {
check: "detailed",
include: ["resources"]
});
// Auto-scaling recommendations
const scaleRecs = await autoScaleBasedOnHealth(endpointIds);
if (scaleRecs.length > 0) {
console.log("Scaling recommendations:");
scaleRecs.forEach(rec => {
console.log(` ${rec.endpointId}: ${rec.action} - ${rec.reason}`);
});
}
// Start continuous monitoring (uncomment to run)
// const stopMonitoring = await monitorEndpointHealth(endpointId, {
// interval: 30000,
// alertThresholds: { gpuMemory: 85, gpuTemp: 75 }
// });
// Stop monitoring after 5 minutes
// setTimeout(stopMonitoring, 5 * 60 * 1000);
} catch (error) {
console.error("Health monitoring error:", error);
}
}
main();
Use Cases
Production Monitoring
- Service Reliability: Monitor endpoint health in production environments
- Automated Alerts: Set up alerting based on health status changes
- Load Balancing: Route traffic away from unhealthy endpoints
- Capacity Planning: Monitor resource utilization trends
CI/CD Integration
- Deployment Validation: Verify endpoint health after deployments
- Rollback Triggers: Automatically rollback on health failures
- Readiness Gates: Wait for endpoints to be ready before promoting traffic
- Health-based Testing: Run tests only when endpoints are healthy
Auto-scaling and Orchestration
- Kubernetes Integration: Use as readiness and liveness probes
- Auto-scaling Triggers: Scale based on resource health metrics
- Failover Systems: Detect failures and switch to backup endpoints
- Maintenance Windows: Schedule maintenance based on health patterns
Development and Debugging
- Performance Optimization: Identify performance bottlenecks
- Resource Monitoring: Track resource usage during development
- Dependency Validation: Ensure all dependencies are healthy
- Cold Start Analysis: Monitor startup performance and optimization
Best Practices
Health Check Strategy
- Regular Monitoring: Implement regular health checks with appropriate intervals
- Graduated Checks: Use basic checks for frequent monitoring, detailed for diagnostics
- Timeout Management: Set appropriate timeouts based on expected response times
- Error Handling: Implement graceful handling of health check failures
- Check Frequency: Balance monitoring frequency with system load
- Batch Operations: Use batch health checks for multiple endpoints
- Caching: Cache health results for non-critical monitoring
- Selective Inclusion: Only request detailed metrics when needed
Alert Configuration
- Threshold Setting: Set appropriate thresholds for different severity levels
- Alert Fatigue: Prevent alert fatigue with intelligent alerting
- Escalation Paths: Define clear escalation procedures for different issue types
- Recovery Actions: Implement automated recovery actions where appropriate
Integration Patterns
- Circuit Breakers: Use health status to trigger circuit breaker patterns
- Service Mesh: Integrate with service mesh health checking
- Monitoring Tools: Export health metrics to monitoring and observability tools
- Documentation: Document health check interpretations and response procedures
Health checks are cached for 30 seconds to reduce system load. For real-time status updates, use the streaming
endpoints or webhook notifications.
Deep health checks consume more resources and should be used sparingly in production. Use basic or detailed
checks for regular monitoring.
Set up automated recovery actions based on health status to reduce manual intervention and improve system
reliability. Consider implementing circuit breaker patterns for improved resilience.
API key authentication. Use 'Bearer YOUR_API_KEY' format.
The response is of type object
.