Get Execution Status

Get comprehensive status information for asynchronous endpoint executions, including progress tracking, resource usage, and execution metadata. This endpoint is essential for monitoring long-running tasks and batch processing operations.

Path Parameters

jobId: The unique identifier of the job to check status for

Query Parameters

include: Optional array of additional fields to include (logs, metrics, resources)
format: Response format (json, summary) - defaults to json

Example Usage

Basic Status Check

curl -X GET "https://api.tensorone.ai/v2/jobs/job_1234567890abcdef/status" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json"

Detailed Status with Logs and Metrics

curl -X GET "https://api.tensorone.ai/v2/jobs/job_1234567890abcdef/status?include=logs,metrics,resources" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json"

Multiple Job Status Check

curl -X POST "https://api.tensorone.ai/v2/jobs/status/batch" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "jobIds": [
      "job_1234567890abcdef",
      "job_2345678901bcdefg",
      "job_3456789012cdefgh"
    ],
    "include": ["metrics"]
  }'

Response

Successful Response

{
    "jobId": "job_1234567890abcdef",
    "status": "running",
    "progress": 75,
    "endpointId": "ep_image_processor",
    "priority": "high",
    "createdAt": "2024-01-15T14:30:00Z",
    "startedAt": "2024-01-15T14:31:15Z",
    "estimatedCompletion": "2024-01-15T14:45:30Z",
    "currentStep": "processing_images",
    "metadata": {
        "imagesProcessed": 750,
        "totalImages": 1000,
        "averageProcessingTime": 1.2,
        "currentBatch": 15,
        "totalBatches": 20
    },
    "resources": {
        "gpuType": "NVIDIA A100",
        "gpuUtilization": 85,
        "memoryUsage": "12.5GB",
        "memoryTotal": "40GB",
        "cpuUsage": 45
    },
    "execution": {
        "duration": 847,
        "costAccrued": 2.47,
        "tokensProcessed": 125000,
        "apiCalls": 1247
    },
    "tags": {
        "userId": "user_12345",
        "projectId": "proj_batch_processing",
        "category": "image_enhancement"
    }
}

Completed Job Response

{
    "jobId": "job_completed_example",
    "status": "completed",
    "progress": 100,
    "endpointId": "ep_text_generator",
    "createdAt": "2024-01-15T14:00:00Z",
    "startedAt": "2024-01-15T14:01:00Z",
    "completedAt": "2024-01-15T14:15:30Z",
    "executionTime": 870.5,
    "output": {
        "result": "Generated content successfully saved to storage",
        "outputUrl": "https://storage.tensorone.ai/outputs/doc_abc123.pdf",
        "size": "2.4MB",
        "format": "pdf"
    },
    "finalMetrics": {
        "documentsGenerated": 50,
        "wordsGenerated": 45000,
        "totalCost": 5.67,
        "averageLatency": 17.4
    },
    "resources": {
        "peakGpuUtilization": 92,
        "peakMemoryUsage": "18.2GB",
        "totalComputeTime": 852.3
    }
}

Failed Job Response

{
    "jobId": "job_failed_example",
    "status": "failed",
    "progress": 45,
    "endpointId": "ep_video_processor",
    "createdAt": "2024-01-15T13:30:00Z",
    "startedAt": "2024-01-15T13:31:00Z",
    "failedAt": "2024-01-15T13:45:22Z",
    "error": {
        "code": "RESOURCE_EXHAUSTED",
        "message": "GPU memory limit exceeded during processing",
        "details": {
            "step": "video_encoding",
            "memoryRequired": "45GB",
            "memoryAvailable": "40GB",
            "suggestion": "Reduce input resolution or use smaller batch size"
        }
    },
    "partialResults": {
        "processedFrames": 15420,
        "totalFrames": 34200,
        "outputFiles": [
            "https://storage.tensorone.ai/temp/partial_output_1.mp4"
        ]
    },
    "retryable": true,
    "retryCount": 1,
    "nextRetryAt": "2024-01-15T14:00:00Z"
}

Status Values

Primary States

queued: Job submitted and waiting for available resources
initializing: Job is starting up and loading resources
running: Job is actively processing
completed: Job finished successfully
failed: Job encountered an error and stopped
cancelled: Job was cancelled by user or system
timeout: Job exceeded maximum execution time
paused: Job temporarily paused (manual or automatic)

Substates for Running Jobs

warming_up: Cold start - loading model and dependencies
processing: Actively processing input data
finalizing: Completing processing and preparing output
uploading: Transferring output to storage

Progress Tracking

Progress Information

Jobs include detailed progress information:

progress: Percentage completion (0-100)
currentStep: Current processing phase description
estimatedCompletion: Predicted completion timestamp
metadata: Task-specific progress details
throughput: Processing rate (items/second, tokens/second, etc.)

Real-time Updates

# Use Server-Sent Events for real-time status updates
curl -X GET "https://api.tensorone.ai/v2/jobs/job_1234567890abcdef/status/stream" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Accept: text/event-stream"

Resource Monitoring

GPU Utilization

Track GPU usage in real-time:

{
    "resources": {
        "gpus": [
            {
                "id": "gpu_0",
                "type": "NVIDIA A100",
                "utilization": 87,
                "memoryUsed": "35.2GB",
                "memoryTotal": "40GB",
                "temperature": 72,
                "powerUsage": "280W"
            }
        ],
        "cpu": {
            "utilization": 45,
            "cores": 16,
            "memory": "64GB"
        }
    }
}

Cold Start Monitoring

Startup Performance

Track cold start metrics for optimization:

{
    "coldStart": {
        "occurred": true,
        "duration": 45.2,
        "phases": {
            "containerStart": 12.5,
            "modelLoad": 28.1,
            "dependencyLoad": 4.6
        },
        "cacheHit": false,
        "optimizationSuggestions": [
            "Consider using warm pools for frequently accessed models",
            "Optimize model size or use quantized versions"
        ]
    }
}

Error Handling

404 Not Found

{
    "error": "JOB_NOT_FOUND",
    "message": "Job job_invalid does not exist or has expired",
    "details": {
        "jobId": "job_invalid",
        "possibleReasons": [
            "Job ID is incorrect",
            "Job was deleted after completion",
            "Job expired (older than 30 days)"
        ]
    }
}

403 Forbidden

{
    "error": "ACCESS_DENIED",
    "message": "You don't have permission to view this job",
    "details": {
        "jobId": "job_1234567890abcdef",
        "requiredPermission": "jobs:read",
        "userPermissions": ["endpoints:execute"]
    }
}

429 Rate Limited

{
    "error": "RATE_LIMIT_EXCEEDED",
    "message": "Too many status check requests",
    "details": {
        "limit": 100,
        "window": "1m",
        "retryAfter": 30,
        "recommendation": "Use webhooks or SSE for real-time updates"
    }
}

SDK Examples

Python SDK

from tensorone import TensorOneClient
import time
import asyncio

client = TensorOneClient(api_key="your_api_key")

# Basic status check
def check_job_status(job_id):
    status = client.jobs.get_status(job_id)
    print(f"Job {job_id}: {status.status} ({status.progress}%)")
    return status

# Real-time status monitoring
async def monitor_job_with_streaming(job_id):
    async for update in client.jobs.stream_status(job_id):
        print(f"Progress: {update.progress}% - {update.current_step}")
        
        if update.status == "completed":
            print(f"Job completed! Output: {update.output}")
            break
        elif update.status == "failed":
            print(f"Job failed: {update.error.message}")
            break

# Batch status checking
def check_multiple_jobs(job_ids):
    statuses = client.jobs.get_batch_status(
        job_ids=job_ids,
        include=["metrics", "resources"]
    )
    
    for status in statuses:
        print(f"Job {status.job_id}: {status.status}")
        if status.resources:
            print(f"  GPU Usage: {status.resources.gpu_utilization}%")
            print(f"  Memory: {status.resources.memory_usage}")

# Monitor with custom intervals
def monitor_job_with_backoff(job_id, max_wait=3600):
    intervals = [1, 2, 5, 10, 30, 60]  # Exponential backoff
    interval_index = 0
    start_time = time.time()
    
    while time.time() - start_time < max_wait:
        status = client.jobs.get_status(job_id, include=["resources"])
        print(f"Status: {status.status}, Progress: {status.progress}%")
        
        if status.status in ["completed", "failed", "cancelled"]:
            return status
            
        # Use exponential backoff
        sleep_time = intervals[min(interval_index, len(intervals) - 1)]
        time.sleep(sleep_time)
        interval_index += 1
    
    raise TimeoutError("Job monitoring timed out")

# Usage examples
if __name__ == "__main__":
    job_id = "job_1234567890abcdef"
    
    # Check current status
    current_status = check_job_status(job_id)
    
    # Monitor multiple jobs
    check_multiple_jobs([
        "job_1234567890abcdef",
        "job_2345678901bcdefg"
    ])
    
    # Stream real-time updates
    asyncio.run(monitor_job_with_streaming(job_id))

JavaScript SDK

import { TensorOneClient } from "@tensorone/sdk";

const client = new TensorOneClient({ apiKey: "your_api_key" });

// Basic status checking
async function checkJobStatus(jobId) {
    const status = await client.jobs.getStatus(jobId);
    console.log(`Job ${jobId}: ${status.status} (${status.progress}%)`);
    return status;
}

// Real-time monitoring with async iterators
async function monitorJobProgress(jobId) {
    for await (const update of client.jobs.streamStatus(jobId)) {
        console.log(`Progress: ${update.progress}% - ${update.currentStep}`);
        
        if (update.resources) {
            console.log(`GPU: ${update.resources.gpuUtilization}%, Memory: ${update.resources.memoryUsage}`);
        }
        
        if (update.status === "completed") {
            console.log("Job completed!", update.output);
            break;
        } else if (update.status === "failed") {
            console.error("Job failed:", update.error.message);
            break;
        }
    }
}

// Batch status checking
async function checkMultipleJobs(jobIds) {
    const statuses = await client.jobs.getBatchStatus({
        jobIds,
        include: ["metrics", "resources", "logs"]
    });
    
    statuses.forEach(status => {
        console.log(`Job ${status.jobId}: ${status.status}`);
        if (status.metrics) {
            console.log(`  Cost: $${status.execution.costAccrued}`);
            console.log(`  Duration: ${status.execution.duration}s`);
        }
    });
}

// Monitor with promise-based polling
async function pollJobStatus(jobId, options = {}) {
    const { maxWait = 3600000, interval = 5000 } = options;
    const startTime = Date.now();
    
    while (Date.now() - startTime < maxWait) {
        const status = await client.jobs.getStatus(jobId, {
            include: ["resources", "metrics"]
        });
        
        console.log(`Status: ${status.status}, Progress: ${status.progress}%`);
        
        if (["completed", "failed", "cancelled"].includes(status.status)) {
            return status;
        }
        
        // Dynamic interval adjustment based on progress
        const dynamicInterval = status.progress > 90 ? 2000 : interval;
        await new Promise(resolve => setTimeout(resolve, dynamicInterval));
    }
    
    throw new Error("Job monitoring timed out");
}

// Usage examples
async function main() {
    const jobId = "job_1234567890abcdef";
    
    try {
        // Check current status
        const status = await checkJobStatus(jobId);
        
        // Monitor with different strategies
        if (status.status === "running") {
            // Use streaming for real-time updates
            await monitorJobProgress(jobId);
        } else if (status.status === "queued") {
            // Use polling for queued jobs
            await pollJobStatus(jobId);
        }
        
        // Check multiple jobs
        await checkMultipleJobs([
            "job_1234567890abcdef",
            "job_2345678901bcdefg"
        ]);
        
    } catch (error) {
        console.error("Error monitoring job:", error);
    }
}

main();

Use Cases

Production Monitoring

Job Dashboards: Build real-time dashboards showing job progress and resource usage
Alert Systems: Set up alerts for failed jobs or resource constraints
Capacity Planning: Monitor resource utilization to optimize cluster sizing

Batch Processing

Progress Tracking: Monitor large batch jobs processing thousands of items
Resource Optimization: Track GPU utilization to optimize batch sizes
Cost Management: Monitor execution costs in real-time

Development and Testing

Debugging: Monitor job execution to identify bottlenecks and errors
Performance Tuning: Track cold start times and resource usage patterns
Load Testing: Monitor system behavior under different load conditions

Best Practices

Monitoring Strategy

Use Webhooks: Prefer webhooks over polling for better performance
Exponential Backoff: Implement exponential backoff for polling to reduce API load
Batch Requests: Check multiple job statuses in a single request when possible
Real-time Streaming: Use SSE for real-time updates on critical jobs

Performance Optimization

Selective Fields: Only request additional fields (logs, metrics) when needed
Caching: Cache status responses for non-critical monitoring
Rate Limiting: Respect rate limits to avoid throttling

Error Handling

Graceful Degradation: Handle temporary API failures gracefully
Retry Logic: Implement retry logic for transient errors
Timeout Management: Set appropriate timeouts for long-running jobs

Cost Management

Monitor Costs: Track costAccrued field to prevent budget overruns
Resource Alerts: Set up alerts when resource usage exceeds thresholds
Optimization: Use status data to identify optimization opportunities

Job status is updated in real-time. Use streaming endpoints or webhooks for the most current information without overwhelming the API with polling requests.

Jobs older than 30 days are automatically deleted. Ensure you save important status information and outputs before they expire.

Use the include parameter strategically - only request additional data like logs and metrics when you actually need them to keep responses fast and reduce bandwidth usage.

Authorizations

Authorization

string

header

required

API key authentication. Use 'Bearer YOUR_API_KEY' format.

Path Parameters

jobId

string

required

Response

200 - application/json

Job status

The response is of type object.

Getting Started

Account Management

GPU Clusters (VPS)

Serverless Endpoints

Managed Training

AI Services

Payment & Billing

Monitoring & Analytics

​Path Parameters

​Query Parameters

​Example Usage

​Basic Status Check

​Detailed Status with Logs and Metrics

​Multiple Job Status Check

​Response

​Successful Response

​Completed Job Response

​Failed Job Response

​Status Values

​Primary States

​Substates for Running Jobs

​Progress Tracking

​Progress Information

​Real-time Updates

​Resource Monitoring

​GPU Utilization

​Cold Start Monitoring

​Startup Performance

​Error Handling

​404 Not Found

​403 Forbidden

​429 Rate Limited

​SDK Examples

​Python SDK

​JavaScript SDK

​Use Cases

​Production Monitoring

​Batch Processing

​Development and Testing

​Best Practices

​Monitoring Strategy

​Performance Optimization

​Error Handling

​Cost Management

Authorizations

Path Parameters

Response

Path Parameters

Query Parameters

Example Usage

Basic Status Check

Detailed Status with Logs and Metrics

Multiple Job Status Check

Response

Successful Response

Completed Job Response

Failed Job Response

Status Values

Primary States

Substates for Running Jobs

Progress Tracking

Progress Information

Real-time Updates

Resource Monitoring

GPU Utilization

Cold Start Monitoring

Startup Performance

Error Handling

404 Not Found

403 Forbidden

429 Rate Limited

SDK Examples

Python SDK

JavaScript SDK

Use Cases

Production Monitoring

Batch Processing

Development and Testing

Best Practices

Monitoring Strategy

Performance Optimization

Error Handling

Cost Management