Get comprehensive status information for asynchronous endpoint executions, including progress tracking, resource usage, and execution metadata. This endpoint is essential for monitoring long-running tasks and batch processing operations.
Path Parameters
jobId
: The unique identifier of the job to check status for
Query Parameters
include
: Optional array of additional fields to include (logs
, metrics
, resources
)
format
: Response format (json
, summary
) - defaults to json
Example Usage
Basic Status Check
curl -X GET "https://api.tensorone.ai/v2/jobs/job_1234567890abcdef/status" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json"
Detailed Status with Logs and Metrics
curl -X GET "https://api.tensorone.ai/v2/jobs/job_1234567890abcdef/status?include=logs,metrics,resources" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json"
Multiple Job Status Check
curl -X POST "https://api.tensorone.ai/v2/jobs/status/batch" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"jobIds": [
"job_1234567890abcdef",
"job_2345678901bcdefg",
"job_3456789012cdefgh"
],
"include": ["metrics"]
}'
Response
Successful Response
{
"jobId": "job_1234567890abcdef",
"status": "running",
"progress": 75,
"endpointId": "ep_image_processor",
"priority": "high",
"createdAt": "2024-01-15T14:30:00Z",
"startedAt": "2024-01-15T14:31:15Z",
"estimatedCompletion": "2024-01-15T14:45:30Z",
"currentStep": "processing_images",
"metadata": {
"imagesProcessed": 750,
"totalImages": 1000,
"averageProcessingTime": 1.2,
"currentBatch": 15,
"totalBatches": 20
},
"resources": {
"gpuType": "NVIDIA A100",
"gpuUtilization": 85,
"memoryUsage": "12.5GB",
"memoryTotal": "40GB",
"cpuUsage": 45
},
"execution": {
"duration": 847,
"costAccrued": 2.47,
"tokensProcessed": 125000,
"apiCalls": 1247
},
"tags": {
"userId": "user_12345",
"projectId": "proj_batch_processing",
"category": "image_enhancement"
}
}
Completed Job Response
{
"jobId": "job_completed_example",
"status": "completed",
"progress": 100,
"endpointId": "ep_text_generator",
"createdAt": "2024-01-15T14:00:00Z",
"startedAt": "2024-01-15T14:01:00Z",
"completedAt": "2024-01-15T14:15:30Z",
"executionTime": 870.5,
"output": {
"result": "Generated content successfully saved to storage",
"outputUrl": "https://storage.tensorone.ai/outputs/doc_abc123.pdf",
"size": "2.4MB",
"format": "pdf"
},
"finalMetrics": {
"documentsGenerated": 50,
"wordsGenerated": 45000,
"totalCost": 5.67,
"averageLatency": 17.4
},
"resources": {
"peakGpuUtilization": 92,
"peakMemoryUsage": "18.2GB",
"totalComputeTime": 852.3
}
}
Failed Job Response
{
"jobId": "job_failed_example",
"status": "failed",
"progress": 45,
"endpointId": "ep_video_processor",
"createdAt": "2024-01-15T13:30:00Z",
"startedAt": "2024-01-15T13:31:00Z",
"failedAt": "2024-01-15T13:45:22Z",
"error": {
"code": "RESOURCE_EXHAUSTED",
"message": "GPU memory limit exceeded during processing",
"details": {
"step": "video_encoding",
"memoryRequired": "45GB",
"memoryAvailable": "40GB",
"suggestion": "Reduce input resolution or use smaller batch size"
}
},
"partialResults": {
"processedFrames": 15420,
"totalFrames": 34200,
"outputFiles": [
"https://storage.tensorone.ai/temp/partial_output_1.mp4"
]
},
"retryable": true,
"retryCount": 1,
"nextRetryAt": "2024-01-15T14:00:00Z"
}
Status Values
Primary States
queued
: Job submitted and waiting for available resources
initializing
: Job is starting up and loading resources
running
: Job is actively processing
completed
: Job finished successfully
failed
: Job encountered an error and stopped
cancelled
: Job was cancelled by user or system
timeout
: Job exceeded maximum execution time
paused
: Job temporarily paused (manual or automatic)
Substates for Running Jobs
warming_up
: Cold start - loading model and dependencies
processing
: Actively processing input data
finalizing
: Completing processing and preparing output
uploading
: Transferring output to storage
Progress Tracking
Jobs include detailed progress information:
progress
: Percentage completion (0-100)
currentStep
: Current processing phase description
estimatedCompletion
: Predicted completion timestamp
metadata
: Task-specific progress details
throughput
: Processing rate (items/second, tokens/second, etc.)
Real-time Updates
# Use Server-Sent Events for real-time status updates
curl -X GET "https://api.tensorone.ai/v2/jobs/job_1234567890abcdef/status/stream" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Accept: text/event-stream"
Resource Monitoring
GPU Utilization
Track GPU usage in real-time:
{
"resources": {
"gpus": [
{
"id": "gpu_0",
"type": "NVIDIA A100",
"utilization": 87,
"memoryUsed": "35.2GB",
"memoryTotal": "40GB",
"temperature": 72,
"powerUsage": "280W"
}
],
"cpu": {
"utilization": 45,
"cores": 16,
"memory": "64GB"
}
}
}
Cold Start Monitoring
Track cold start metrics for optimization:
{
"coldStart": {
"occurred": true,
"duration": 45.2,
"phases": {
"containerStart": 12.5,
"modelLoad": 28.1,
"dependencyLoad": 4.6
},
"cacheHit": false,
"optimizationSuggestions": [
"Consider using warm pools for frequently accessed models",
"Optimize model size or use quantized versions"
]
}
}
Error Handling
404 Not Found
{
"error": "JOB_NOT_FOUND",
"message": "Job job_invalid does not exist or has expired",
"details": {
"jobId": "job_invalid",
"possibleReasons": [
"Job ID is incorrect",
"Job was deleted after completion",
"Job expired (older than 30 days)"
]
}
}
403 Forbidden
{
"error": "ACCESS_DENIED",
"message": "You don't have permission to view this job",
"details": {
"jobId": "job_1234567890abcdef",
"requiredPermission": "jobs:read",
"userPermissions": ["endpoints:execute"]
}
}
429 Rate Limited
{
"error": "RATE_LIMIT_EXCEEDED",
"message": "Too many status check requests",
"details": {
"limit": 100,
"window": "1m",
"retryAfter": 30,
"recommendation": "Use webhooks or SSE for real-time updates"
}
}
SDK Examples
Python SDK
from tensorone import TensorOneClient
import time
import asyncio
client = TensorOneClient(api_key="your_api_key")
# Basic status check
def check_job_status(job_id):
status = client.jobs.get_status(job_id)
print(f"Job {job_id}: {status.status} ({status.progress}%)")
return status
# Real-time status monitoring
async def monitor_job_with_streaming(job_id):
async for update in client.jobs.stream_status(job_id):
print(f"Progress: {update.progress}% - {update.current_step}")
if update.status == "completed":
print(f"Job completed! Output: {update.output}")
break
elif update.status == "failed":
print(f"Job failed: {update.error.message}")
break
# Batch status checking
def check_multiple_jobs(job_ids):
statuses = client.jobs.get_batch_status(
job_ids=job_ids,
include=["metrics", "resources"]
)
for status in statuses:
print(f"Job {status.job_id}: {status.status}")
if status.resources:
print(f" GPU Usage: {status.resources.gpu_utilization}%")
print(f" Memory: {status.resources.memory_usage}")
# Monitor with custom intervals
def monitor_job_with_backoff(job_id, max_wait=3600):
intervals = [1, 2, 5, 10, 30, 60] # Exponential backoff
interval_index = 0
start_time = time.time()
while time.time() - start_time < max_wait:
status = client.jobs.get_status(job_id, include=["resources"])
print(f"Status: {status.status}, Progress: {status.progress}%")
if status.status in ["completed", "failed", "cancelled"]:
return status
# Use exponential backoff
sleep_time = intervals[min(interval_index, len(intervals) - 1)]
time.sleep(sleep_time)
interval_index += 1
raise TimeoutError("Job monitoring timed out")
# Usage examples
if __name__ == "__main__":
job_id = "job_1234567890abcdef"
# Check current status
current_status = check_job_status(job_id)
# Monitor multiple jobs
check_multiple_jobs([
"job_1234567890abcdef",
"job_2345678901bcdefg"
])
# Stream real-time updates
asyncio.run(monitor_job_with_streaming(job_id))
JavaScript SDK
import { TensorOneClient } from "@tensorone/sdk";
const client = new TensorOneClient({ apiKey: "your_api_key" });
// Basic status checking
async function checkJobStatus(jobId) {
const status = await client.jobs.getStatus(jobId);
console.log(`Job ${jobId}: ${status.status} (${status.progress}%)`);
return status;
}
// Real-time monitoring with async iterators
async function monitorJobProgress(jobId) {
for await (const update of client.jobs.streamStatus(jobId)) {
console.log(`Progress: ${update.progress}% - ${update.currentStep}`);
if (update.resources) {
console.log(`GPU: ${update.resources.gpuUtilization}%, Memory: ${update.resources.memoryUsage}`);
}
if (update.status === "completed") {
console.log("Job completed!", update.output);
break;
} else if (update.status === "failed") {
console.error("Job failed:", update.error.message);
break;
}
}
}
// Batch status checking
async function checkMultipleJobs(jobIds) {
const statuses = await client.jobs.getBatchStatus({
jobIds,
include: ["metrics", "resources", "logs"]
});
statuses.forEach(status => {
console.log(`Job ${status.jobId}: ${status.status}`);
if (status.metrics) {
console.log(` Cost: $${status.execution.costAccrued}`);
console.log(` Duration: ${status.execution.duration}s`);
}
});
}
// Monitor with promise-based polling
async function pollJobStatus(jobId, options = {}) {
const { maxWait = 3600000, interval = 5000 } = options;
const startTime = Date.now();
while (Date.now() - startTime < maxWait) {
const status = await client.jobs.getStatus(jobId, {
include: ["resources", "metrics"]
});
console.log(`Status: ${status.status}, Progress: ${status.progress}%`);
if (["completed", "failed", "cancelled"].includes(status.status)) {
return status;
}
// Dynamic interval adjustment based on progress
const dynamicInterval = status.progress > 90 ? 2000 : interval;
await new Promise(resolve => setTimeout(resolve, dynamicInterval));
}
throw new Error("Job monitoring timed out");
}
// Usage examples
async function main() {
const jobId = "job_1234567890abcdef";
try {
// Check current status
const status = await checkJobStatus(jobId);
// Monitor with different strategies
if (status.status === "running") {
// Use streaming for real-time updates
await monitorJobProgress(jobId);
} else if (status.status === "queued") {
// Use polling for queued jobs
await pollJobStatus(jobId);
}
// Check multiple jobs
await checkMultipleJobs([
"job_1234567890abcdef",
"job_2345678901bcdefg"
]);
} catch (error) {
console.error("Error monitoring job:", error);
}
}
main();
Use Cases
Production Monitoring
- Job Dashboards: Build real-time dashboards showing job progress and resource usage
- Alert Systems: Set up alerts for failed jobs or resource constraints
- Capacity Planning: Monitor resource utilization to optimize cluster sizing
Batch Processing
- Progress Tracking: Monitor large batch jobs processing thousands of items
- Resource Optimization: Track GPU utilization to optimize batch sizes
- Cost Management: Monitor execution costs in real-time
Development and Testing
- Debugging: Monitor job execution to identify bottlenecks and errors
- Performance Tuning: Track cold start times and resource usage patterns
- Load Testing: Monitor system behavior under different load conditions
Best Practices
Monitoring Strategy
- Use Webhooks: Prefer webhooks over polling for better performance
- Exponential Backoff: Implement exponential backoff for polling to reduce API load
- Batch Requests: Check multiple job statuses in a single request when possible
- Real-time Streaming: Use SSE for real-time updates on critical jobs
- Selective Fields: Only request additional fields (
logs
, metrics
) when needed
- Caching: Cache status responses for non-critical monitoring
- Rate Limiting: Respect rate limits to avoid throttling
Error Handling
- Graceful Degradation: Handle temporary API failures gracefully
- Retry Logic: Implement retry logic for transient errors
- Timeout Management: Set appropriate timeouts for long-running jobs
Cost Management
- Monitor Costs: Track
costAccrued
field to prevent budget overruns
- Resource Alerts: Set up alerts when resource usage exceeds thresholds
- Optimization: Use status data to identify optimization opportunities
Job status is updated in real-time. Use streaming endpoints or webhooks for the most current information without
overwhelming the API with polling requests.
Jobs older than 30 days are automatically deleted. Ensure you save important status information and outputs
before they expire.
Use the include
parameter strategically - only request additional data like logs
and metrics
when you
actually need them to keep responses fast and reduce bandwidth usage.
API key authentication. Use 'Bearer YOUR_API_KEY' format.
The response is of type object
.