Get Endpoint Logs
curl --request GET \
  --url https://api.tensorone.ai/v2/endpoints/{endpointId}/logs \
  --header 'Authorization: <api-key>'
{
  "logs": [
    "<string>"
  ]
}
Access comprehensive execution logs and debugging information for your serverless endpoints. Monitor application behavior, troubleshoot issues, and gain insights into execution patterns with structured logging and real-time log streaming.

Path Parameters

  • endpointId: The unique identifier of the endpoint to retrieve logs for

Query Parameters

  • startTime: Start time for log retrieval (ISO 8601 format, e.g., 2024-01-15T10:30:00Z)
  • endTime: End time for log retrieval (ISO 8601 format, defaults to current time)
  • level: Log level filter (debug, info, warn, error, fatal) - defaults to info
  • limit: Maximum number of log entries to return (1-10000) - defaults to 1000
  • offset: Number of log entries to skip for pagination - defaults to 0
  • jobId: Filter logs for a specific job execution
  • search: Search term to filter log messages (supports regex)
  • format: Response format (json, text, csv) - defaults to json
  • stream: Enable real-time log streaming (true, false) - defaults to false

Example Usage

Basic Log Retrieval

curl -X GET "https://api.tensorone.ai/v2/endpoints/ep_1234567890abcdef/logs" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json"

Filtered Logs with Time Range

curl -X GET "https://api.tensorone.ai/v2/endpoints/ep_1234567890abcdef/logs?startTime=2024-01-15T10:00:00Z&endTime=2024-01-15T11:00:00Z&level=error" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json"

Job-Specific Logs

curl -X GET "https://api.tensorone.ai/v2/endpoints/ep_1234567890abcdef/logs?jobId=job_1234567890abcdef&level=debug" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json"

Search Logs with Pattern

curl -X GET "https://api.tensorone.ai/v2/endpoints/ep_1234567890abcdef/logs?search=memory.*error&level=warn" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json"

Real-time Log Streaming

curl -X GET "https://api.tensorone.ai/v2/endpoints/ep_1234567890abcdef/logs?stream=true" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Accept: text/event-stream"

Export Logs in CSV Format

curl -X GET "https://api.tensorone.ai/v2/endpoints/ep_1234567890abcdef/logs?format=csv&startTime=2024-01-15T00:00:00Z&endTime=2024-01-15T23:59:59Z" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Accept: text/csv"

Response

JSON Format Response

{
    "endpointId": "ep_1234567890abcdef",
    "totalLogs": 1247,
    "filteredLogs": 856,
    "startTime": "2024-01-15T10:00:00Z",
    "endTime": "2024-01-15T11:00:00Z",
    "pagination": {
        "limit": 1000,
        "offset": 0,
        "hasMore": false,
        "nextOffset": null
    },
    "logs": [
        {
            "timestamp": "2024-01-15T10:30:15.234Z",
            "level": "info",
            "message": "Processing image generation request",
            "jobId": "job_1234567890abcdef",
            "requestId": "req_abc123def456",
            "source": "model_inference",
            "metadata": {
                "prompt": "A sunset over mountains",
                "model": "stable-diffusion-xl",
                "parameters": {
                    "steps": 30,
                    "guidance_scale": 7.5,
                    "width": 1024,
                    "height": 1024
                }
            },
            "duration": null,
            "memoryUsage": "8.2GB",
            "gpuUtilization": 87
        },
        {
            "timestamp": "2024-01-15T10:30:18.567Z",
            "level": "debug",
            "message": "Model loaded successfully",
            "jobId": "job_1234567890abcdef",
            "requestId": "req_abc123def456",
            "source": "model_loader",
            "metadata": {
                "modelPath": "/models/stable-diffusion-xl/model.safetensors",
                "loadTime": 3.2,
                "modelSize": "6.9GB",
                "precision": "fp16"
            },
            "duration": 3.2,
            "memoryUsage": "6.9GB",
            "gpuUtilization": 45
        },
        {
            "timestamp": "2024-01-15T10:30:22.891Z",
            "level": "info",
            "message": "Image generation completed",
            "jobId": "job_1234567890abcdef",
            "requestId": "req_abc123def456",
            "source": "model_inference",
            "metadata": {
                "outputPath": "/tmp/generated_image_abc123.png",
                "generationTime": 4.3,
                "seed": 42,
                "finalSteps": 30
            },
            "duration": 4.3,
            "memoryUsage": "8.2GB",
            "gpuUtilization": 92
        },
        {
            "timestamp": "2024-01-15T10:30:25.123Z",
            "level": "warn",
            "message": "High GPU memory usage detected",
            "jobId": "job_1234567890abcdef",
            "requestId": "req_abc123def456",
            "source": "resource_monitor",
            "metadata": {
                "currentUsage": "38.5GB",
                "totalMemory": "40GB",
                "utilizationPercent": 96.25,
                "recommendation": "Consider reducing batch size or image resolution"
            },
            "duration": null,
            "memoryUsage": "38.5GB",
            "gpuUtilization": 96
        },
        {
            "timestamp": "2024-01-15T10:30:28.456Z",
            "level": "error",
            "message": "Failed to upload result to storage",
            "jobId": "job_1234567890abcdef",
            "requestId": "req_abc123def456",
            "source": "storage_uploader",
            "metadata": {
                "error": "Connection timeout after 30 seconds",
                "errorCode": "STORAGE_TIMEOUT",
                "retryAttempt": 1,
                "maxRetries": 3,
                "filePath": "/tmp/generated_image_abc123.png",
                "fileSize": "2.4MB"
            },
            "duration": 30.0,
            "stackTrace": [
                "at StorageUploader.upload (storage.js:45:12)",
                "at async ImageProcessor.saveResult (processor.js:128:8)",
                "at async handleRequest (handler.js:67:4)"
            ]
        }
    ],
    "summary": {
        "logLevels": {
            "debug": 234,
            "info": 456,
            "warn": 123,
            "error": 43,
            "fatal": 0
        },
        "sources": {
            "model_inference": 245,
            "model_loader": 67,
            "resource_monitor": 156,
            "storage_uploader": 89,
            "api_handler": 299
        },
        "commonErrors": [
            {
                "error": "STORAGE_TIMEOUT",
                "count": 12,
                "firstOccurrence": "2024-01-15T10:15:30Z",
                "lastOccurrence": "2024-01-15T10:55:12Z"
            },
            {
                "error": "MEMORY_LIMIT_EXCEEDED",
                "count": 8,
                "firstOccurrence": "2024-01-15T10:22:45Z",
                "lastOccurrence": "2024-01-15T10:48:30Z"
            }
        ]
    }
}

Error Logs with Stack Traces

{
    "timestamp": "2024-01-15T10:35:42.789Z",
    "level": "error",
    "message": "Model inference failed with CUDA out of memory error",
    "jobId": "job_error_example",
    "requestId": "req_error_456",
    "source": "model_inference",
    "metadata": {
        "error": "CUDA out of memory. Tried to allocate 2.50 GiB",
        "errorCode": "CUDA_OOM",
        "gpuMemoryUsed": "39.2GB",
        "gpuMemoryTotal": "40GB",
        "requestedAllocation": "2.5GB",
        "model": "llama-2-70b",
        "batchSize": 4,
        "sequenceLength": 2048
    },
    "stackTrace": [
        "RuntimeError: CUDA out of memory. Tried to allocate 2.50 GiB",
        "  at torch.cuda.OutOfMemoryError",
        "  at model_inference.py:156 in forward()",
        "  at inference_handler.py:89 in process_batch()",
        "  at main.py:45 in handle_request()"
    ],
    "context": {
        "previousRequests": [
            {
                "requestId": "req_456789",
                "memoryUsage": "35.8GB",
                "status": "completed"
            },
            {
                "requestId": "req_567890",
                "memoryUsage": "37.1GB",
                "status": "completed"
            }
        ],
        "systemState": {
            "availableMemory": "0.8GB",
            "activeProcesses": 3,
            "cacheSize": "12.4GB"
        }
    },
    "recoveryActions": [
        {
            "type": "memory_cleanup",
            "description": "Clear model cache and retry",
            "executed": true,
            "result": "freed 8.2GB memory"
        },
        {
            "type": "batch_size_reduction",
            "description": "Reduce batch size from 4 to 2",
            "executed": true,
            "result": "retry successful"
        }
    ]
}

Performance Logs

{
    "timestamp": "2024-01-15T10:40:15.123Z",
    "level": "info",
    "message": "Request processing completed",
    "jobId": "job_perf_example",
    "requestId": "req_perf_789",
    "source": "performance_tracker",
    "metadata": {
        "totalDuration": 12.5,
        "phases": {
            "queueTime": 0.2,
            "coldStartTime": 0.0,
            "modelLoadTime": 0.0,
            "inferenceTime": 11.8,
            "postProcessingTime": 0.3,
            "uploadTime": 0.2
        },
        "resourceUsage": {
            "peakGpuMemory": "32.1GB",
            "peakGpuUtilization": 94,
            "avgCpuUsage": 45,
            "networkIO": {
                "ingress": "125MB",
                "egress": "8.2MB"
            }
        },
        "optimizations": {
            "cacheHit": true,
            "modelReused": true,
            "batchProcessed": false
        }
    },
    "benchmarks": {
        "targetLatency": 10.0,
        "actualLatency": 12.5,
        "performance": "within_sla",
        "percentile": "p85"
    }
}

Text Format Response

2024-01-15T10:30:15.234Z [INFO] model_inference: Processing image generation request (job_1234567890abcdef)
2024-01-15T10:30:18.567Z [DEBUG] model_loader: Model loaded successfully in 3.2s (job_1234567890abcdef)
2024-01-15T10:30:22.891Z [INFO] model_inference: Image generation completed in 4.3s (job_1234567890abcdef)
2024-01-15T10:30:25.123Z [WARN] resource_monitor: High GPU memory usage detected: 96.25% (job_1234567890abcdef)
2024-01-15T10:30:28.456Z [ERROR] storage_uploader: Failed to upload result to storage: Connection timeout (job_1234567890abcdef)

Real-time Streaming Response

data: {"timestamp":"2024-01-15T10:45:00.123Z","level":"info","message":"New request received","jobId":"job_live_stream","source":"api_handler"}

data: {"timestamp":"2024-01-15T10:45:01.456Z","level":"debug","message":"Loading model weights","jobId":"job_live_stream","source":"model_loader"}

data: {"timestamp":"2024-01-15T10:45:05.789Z","level":"info","message":"Model ready for inference","jobId":"job_live_stream","source":"model_loader"}

data: {"timestamp":"2024-01-15T10:45:08.012Z","level":"info","message":"Processing completed successfully","jobId":"job_live_stream","source":"model_inference"}

Log Levels

Level Hierarchy

  • debug: Detailed diagnostic information for development and troubleshooting
  • info: General informational messages about normal operation
  • warn: Warning messages for potentially problematic situations
  • error: Error messages for failures that don’t stop execution
  • fatal: Critical errors that cause execution to stop

Log Sources

  • api_handler: API request handling and routing
  • model_loader: Model loading and initialization
  • model_inference: Model execution and inference
  • resource_monitor: System resource monitoring
  • storage_uploader: File upload and storage operations
  • cache_manager: Caching system operations
  • auto_scaler: Auto-scaling events and decisions

Advanced Filtering

Search Patterns

# Search for memory-related errors
curl -X GET "https://api.tensorone.ai/v2/endpoints/ep_1234567890abcdef/logs?search=memory.*error|OOM|out.*memory" \
  -H "Authorization: Bearer YOUR_API_KEY"

# Search for specific model operations
curl -X GET "https://api.tensorone.ai/v2/endpoints/ep_1234567890abcdef/logs?search=model_(load|unload|inference)" \
  -H "Authorization: Bearer YOUR_API_KEY"

# Search for performance issues
curl -X GET "https://api.tensorone.ai/v2/endpoints/ep_1234567890abcdef/logs?search=timeout|slow|latency|performance" \
  -H "Authorization: Bearer YOUR_API_KEY"

Complex Filtering

curl -X POST "https://api.tensorone.ai/v2/endpoints/logs/search" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "endpointIds": ["ep_1234567890abcdef"],
    "filters": {
      "timeRange": {
        "start": "2024-01-15T10:00:00Z",
        "end": "2024-01-15T11:00:00Z"
      },
      "levels": ["warn", "error", "fatal"],
      "sources": ["model_inference", "resource_monitor"],
      "search": {
        "query": "memory.*usage|GPU.*utilization",
        "caseSensitive": false
      },
      "metadata": {
        "gpuUtilization": {"$gt": 90},
        "memoryUsage": {"$regex": "3[0-9]\\.[0-9]GB"}
      }
    },
    "sort": {
      "field": "timestamp",
      "order": "desc"
    },
    "limit": 500
  }'

Log Aggregation

Log Aggregation Endpoint

curl -X POST "https://api.tensorone.ai/v2/endpoints/logs/aggregate" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "endpointIds": ["ep_1234567890abcdef"],
    "timeRange": {
      "start": "2024-01-15T00:00:00Z",
      "end": "2024-01-15T23:59:59Z"
    },
    "aggregations": [
      {
        "name": "errors_by_hour",
        "groupBy": ["level", "hour"],
        "filters": {"level": ["error", "fatal"]},
        "metrics": ["count", "unique_jobs"]
      },
      {
        "name": "performance_metrics",
        "groupBy": ["source"],
        "metrics": ["avg_duration", "p95_duration", "count"]
      }
    ]
  }'

Aggregation Response

{
    "aggregations": {
        "errors_by_hour": [
            {
                "level": "error",
                "hour": "2024-01-15T10:00:00Z",
                "count": 23,
                "unique_jobs": 18
            },
            {
                "level": "error",
                "hour": "2024-01-15T11:00:00Z",
                "count": 15,
                "unique_jobs": 12
            }
        ],
        "performance_metrics": [
            {
                "source": "model_inference",
                "count": 1247,
                "avg_duration": 8.5,
                "p95_duration": 15.2
            },
            {
                "source": "model_loader",
                "count": 89,
                "avg_duration": 12.3,
                "p95_duration": 25.8
            }
        ]
    }
}

Error Handling

400 Bad Request

{
    "error": "INVALID_TIME_RANGE",
    "message": "Start time must be before end time",
    "details": {
        "startTime": "2024-01-15T12:00:00Z",
        "endTime": "2024-01-15T10:00:00Z",
        "maxTimeRange": "24h"
    }
}

403 Forbidden

{
    "error": "INSUFFICIENT_PERMISSIONS",
    "message": "Logs access requires logs:read permission",
    "details": {
        "requiredPermission": "logs:read",
        "currentPermissions": ["endpoints:execute"]
    }
}

413 Payload Too Large

{
    "error": "TOO_MANY_LOGS",
    "message": "Requested log range contains too many entries",
    "details": {
        "requestedCount": 50000,
        "maxAllowed": 10000,
        "suggestion": "Use smaller time ranges or increase pagination"
    }
}

429 Rate Limited

{
    "error": "RATE_LIMIT_EXCEEDED",
    "message": "Too many log requests",
    "details": {
        "limit": 100,
        "window": "1h",
        "retryAfter": 60,
        "suggestion": "Use streaming for real-time logs"
    }
}

SDK Examples

Python SDK

from tensorone import TensorOneClient
import json
import time
from datetime import datetime, timedelta
import pandas as pd

client = TensorOneClient(api_key="your_api_key")

# Basic log retrieval
def get_endpoint_logs(endpoint_id, hours_back=1):
    end_time = datetime.utcnow()
    start_time = end_time - timedelta(hours=hours_back)
    
    logs = client.endpoints.get_logs(
        endpoint_id=endpoint_id,
        start_time=start_time.isoformat() + 'Z',
        end_time=end_time.isoformat() + 'Z',
        level='info',
        limit=1000
    )
    
    print(f"Retrieved {len(logs.logs)} logs for {endpoint_id}")
    
    # Display summary
    summary = logs.summary
    print(f"Log levels: {summary.log_levels}")
    print(f"Sources: {summary.sources}")
    
    return logs

# Error analysis
def analyze_errors(endpoint_id, hours_back=24):
    end_time = datetime.utcnow()
    start_time = end_time - timedelta(hours=hours_back)
    
    # Get error logs
    error_logs = client.endpoints.get_logs(
        endpoint_id=endpoint_id,
        start_time=start_time.isoformat() + 'Z',
        end_time=end_time.isoformat() + 'Z',
        level='error',
        limit=5000
    )
    
    print(f"Error Analysis for {endpoint_id}")
    print(f"Total errors in last {hours_back} hours: {len(error_logs.logs)}")
    
    # Analyze error patterns
    error_codes = {}
    error_sources = {}
    
    for log in error_logs.logs:
        # Count error codes
        if log.metadata and 'errorCode' in log.metadata:
            code = log.metadata['errorCode']
            error_codes[code] = error_codes.get(code, 0) + 1
        
        # Count error sources
        source = log.source
        error_sources[source] = error_sources.get(source, 0) + 1
    
    print("\nTop Error Codes:")
    for code, count in sorted(error_codes.items(), key=lambda x: x[1], reverse=True):
        print(f"  {code}: {count}")
    
    print("\nErrors by Source:")
    for source, count in sorted(error_sources.items(), key=lambda x: x[1], reverse=True):
        print(f"  {source}: {count}")
    
    return error_logs

# Performance analysis from logs
def analyze_performance(endpoint_id, hours_back=6):
    end_time = datetime.utcnow()
    start_time = end_time - timedelta(hours=hours_back)
    
    # Get performance-related logs
    perf_logs = client.endpoints.get_logs(
        endpoint_id=endpoint_id,
        start_time=start_time.isoformat() + 'Z',
        end_time=end_time.isoformat() + 'Z',
        search='duration|latency|performance|completed',
        level='info'
    )
    
    durations = []
    memory_usage = []
    gpu_utilization = []
    
    for log in perf_logs.logs:
        if log.duration:
            durations.append(log.duration)
        
        if log.metadata:
            # Extract memory usage
            if 'memoryUsage' in log.metadata:
                memory_str = log.metadata['memoryUsage']
                if 'GB' in memory_str:
                    memory_val = float(memory_str.replace('GB', ''))
                    memory_usage.append(memory_val)
            
            # Extract GPU utilization
            if 'gpuUtilization' in log.metadata:
                gpu_utilization.append(log.metadata['gpuUtilization'])
    
    if durations:
        print(f"Performance Analysis:")
        print(f"  Average Duration: {sum(durations)/len(durations):.2f}s")
        print(f"  Min Duration: {min(durations):.2f}s")
        print(f"  Max Duration: {max(durations):.2f}s")
    
    if memory_usage:
        print(f"  Average Memory Usage: {sum(memory_usage)/len(memory_usage):.1f}GB")
        print(f"  Peak Memory Usage: {max(memory_usage):.1f}GB")
    
    if gpu_utilization:
        print(f"  Average GPU Utilization: {sum(gpu_utilization)/len(gpu_utilization):.1f}%")
        print(f"  Peak GPU Utilization: {max(gpu_utilization):.1f}%")
    
    return {
        'durations': durations,
        'memory_usage': memory_usage,
        'gpu_utilization': gpu_utilization
    }

# Real-time log monitoring
def monitor_logs_realtime(endpoint_id, callback=None):
    """Monitor logs in real-time using streaming"""
    
    def default_callback(log_entry):
        timestamp = log_entry['timestamp']
        level = log_entry['level']
        message = log_entry['message']
        source = log_entry.get('source', 'unknown')
        
        print(f"[{timestamp}] {level.upper()}: {message} ({source})")
        
        # Alert on errors
        if level in ['error', 'fatal']:
            print(f"🚨 ALERT: {level.upper()} in {source}")
            if log_entry.get('metadata', {}).get('errorCode'):
                print(f"   Error Code: {log_entry['metadata']['errorCode']}")
    
    callback = callback or default_callback
    
    try:
        # Start streaming logs
        for log_entry in client.endpoints.stream_logs(endpoint_id):
            callback(log_entry)
            
    except KeyboardInterrupt:
        print("\nStopping log monitoring...")
    except Exception as e:
        print(f"Error in log monitoring: {e}")

# Log export and analysis
def export_logs_to_dataframe(endpoint_id, hours_back=24):
    """Export logs to pandas DataFrame for analysis"""
    end_time = datetime.utcnow()
    start_time = end_time - timedelta(hours=hours_back)
    
    # Get all logs
    logs = client.endpoints.get_logs(
        endpoint_id=endpoint_id,
        start_time=start_time.isoformat() + 'Z',
        end_time=end_time.isoformat() + 'Z',
        level='debug',  # Get all levels
        limit=10000
    )
    
    # Convert to DataFrame
    log_data = []
    for log in logs.logs:
        row = {
            'timestamp': pd.to_datetime(log.timestamp),
            'level': log.level,
            'message': log.message,
            'source': log.source,
            'job_id': log.job_id,
            'request_id': log.request_id,
            'duration': log.duration,
            'memory_usage': log.memory_usage,
            'gpu_utilization': log.gpu_utilization
        }
        
        # Add metadata fields
        if log.metadata:
            for key, value in log.metadata.items():
                row[f'meta_{key}'] = value
        
        log_data.append(row)
    
    df = pd.DataFrame(log_data)
    
    # Basic analysis
    print(f"Log Analysis Summary:")
    print(f"Total logs: {len(df)}")
    print(f"Date range: {df['timestamp'].min()} to {df['timestamp'].max()}")
    print(f"Log levels:")
    print(df['level'].value_counts())
    print(f"\nTop sources:")
    print(df['source'].value_counts().head())
    
    return df

# Automated error alerting
def setup_error_alerting(endpoint_ids, check_interval=300):
    """Set up automated error alerting for multiple endpoints"""
    
    def check_recent_errors():
        end_time = datetime.utcnow()
        start_time = end_time - timedelta(seconds=check_interval)
        
        alerts = []
        
        for endpoint_id in endpoint_ids:
            try:
                error_logs = client.endpoints.get_logs(
                    endpoint_id=endpoint_id,
                    start_time=start_time.isoformat() + 'Z',
                    end_time=end_time.isoformat() + 'Z',
                    level='error'
                )
                
                if error_logs.logs:
                    error_count = len(error_logs.logs)
                    alerts.append({
                        'endpoint_id': endpoint_id,
                        'error_count': error_count,
                        'recent_errors': error_logs.logs[:3]  # Last 3 errors
                    })
                    
            except Exception as e:
                print(f"Error checking logs for {endpoint_id}: {e}")
        
        if alerts:
            print(f"\n🚨 ERROR ALERT - {datetime.utcnow().isoformat()}")
            for alert in alerts:
                print(f"Endpoint {alert['endpoint_id']}: {alert['error_count']} new errors")
                for error in alert['recent_errors']:
                    print(f"  - {error.message}")
        
        return alerts
    
    print(f"Starting error monitoring for {len(endpoint_ids)} endpoints...")
    print(f"Check interval: {check_interval} seconds")
    
    try:
        while True:
            check_recent_errors()
            time.sleep(check_interval)
    except KeyboardInterrupt:
        print("\nStopping error monitoring...")

# Usage examples
if __name__ == "__main__":
    endpoint_id = "ep_1234567890abcdef"
    
    # Basic log retrieval
    logs = get_endpoint_logs(endpoint_id, hours_back=2)
    
    # Error analysis
    error_analysis = analyze_errors(endpoint_id, hours_back=24)
    
    # Performance analysis
    perf_data = analyze_performance(endpoint_id, hours_back=6)
    
    # Export to DataFrame for advanced analysis
    df = export_logs_to_dataframe(endpoint_id, hours_back=12)
    
    # Real-time monitoring (uncomment to run)
    # monitor_logs_realtime(endpoint_id)
    
    # Automated alerting (uncomment to run)
    # endpoints = ["ep_1234567890abcdef", "ep_2345678901bcdefg"]
    # setup_error_alerting(endpoints, check_interval=300)

JavaScript SDK

import { TensorOneClient } from "@tensorone/sdk";
import fs from 'fs';

const client = new TensorOneClient({ apiKey: "your_api_key" });

// Basic log retrieval
async function getEndpointLogs(endpointId, hoursBack = 1) {
    const endTime = new Date();
    const startTime = new Date(endTime.getTime() - (hoursBack * 60 * 60 * 1000));
    
    const logs = await client.endpoints.getLogs(endpointId, {
        startTime: startTime.toISOString(),
        endTime: endTime.toISOString(),
        level: 'info',
        limit: 1000
    });
    
    console.log(`Retrieved ${logs.logs.length} logs for ${endpointId}`);
    
    // Display summary
    const summary = logs.summary;
    console.log('Log levels:', summary.logLevels);
    console.log('Sources:', summary.sources);
    
    return logs;
}

// Real-time log monitoring with EventSource
async function monitorLogsRealtime(endpointId, options = {}) {
    const { 
        onLog = console.log,
        onError = console.error,
        levelFilter = 'info',
        reconnectDelay = 5000 
    } = options;
    
    let reconnectTimeout;
    
    function connect() {
        const eventSource = new EventSource(
            `https://api.tensorone.ai/v2/endpoints/${endpointId}/logs?stream=true&level=${levelFilter}`,
            {
                headers: {
                    'Authorization': `Bearer ${process.env.TENSORONE_API_KEY}`
                }
            }
        );
        
        eventSource.onmessage = (event) => {
            try {
                const logEntry = JSON.parse(event.data);
                onLog(logEntry);
                
                // Alert on errors
                if (logEntry.level === 'error' || logEntry.level === 'fatal') {
                    console.warn(`🚨 ${logEntry.level.toUpperCase()}: ${logEntry.message}`);
                }
            } catch (error) {
                console.error('Error parsing log entry:', error);
            }
        };
        
        eventSource.onerror = (error) => {
            console.error('Log stream error:', error);
            eventSource.close();
            
            // Reconnect after delay
            console.log(`Reconnecting in ${reconnectDelay/1000} seconds...`);
            reconnectTimeout = setTimeout(connect, reconnectDelay);
        };
        
        // Handle graceful shutdown
        process.on('SIGINT', () => {
            console.log('\nClosing log stream...');
            eventSource.close();
            if (reconnectTimeout) {
                clearTimeout(reconnectTimeout);
            }
            process.exit(0);
        });
        
        return eventSource;
    }
    
    return connect();
}

// Error pattern analysis
async function analyzeErrorPatterns(endpointId, hoursBack = 24) {
    const endTime = new Date();
    const startTime = new Date(endTime.getTime() - (hoursBack * 60 * 60 * 1000));
    
    const errorLogs = await client.endpoints.getLogs(endpointId, {
        startTime: startTime.toISOString(),
        endTime: endTime.toISOString(),
        level: 'error',
        limit: 5000
    });
    
    console.log(`Error Analysis for ${endpointId}`);
    console.log(`Total errors in last ${hoursBack} hours: ${errorLogs.logs.length}`);
    
    // Analyze error patterns
    const errorCodes = {};
    const errorSources = {};
    const errorTimeline = {};
    
    errorLogs.logs.forEach(log => {
        // Count error codes
        if (log.metadata?.errorCode) {
            errorCodes[log.metadata.errorCode] = (errorCodes[log.metadata.errorCode] || 0) + 1;
        }
        
        // Count error sources
        errorSources[log.source] = (errorSources[log.source] || 0) + 1;
        
        // Create hourly timeline
        const hour = new Date(log.timestamp).toISOString().substring(0, 13) + ':00:00Z';
        errorTimeline[hour] = (errorTimeline[hour] || 0) + 1;
    });
    
    console.log('\nTop Error Codes:');
    Object.entries(errorCodes)
        .sort(([,a], [,b]) => b - a)
        .forEach(([code, count]) => {
            console.log(`  ${code}: ${count}`);
        });
    
    console.log('\nErrors by Source:');
    Object.entries(errorSources)
        .sort(([,a], [,b]) => b - a)
        .forEach(([source, count]) => {
            console.log(`  ${source}: ${count}`);
        });
    
    console.log('\nError Timeline (hourly):');
    Object.entries(errorTimeline)
        .sort(([a], [b]) => a.localeCompare(b))
        .forEach(([hour, count]) => {
            console.log(`  ${hour}: ${count} errors`);
        });
    
    return {
        errorCodes,
        errorSources,
        errorTimeline,
        totalErrors: errorLogs.logs.length
    };
}

// Performance metrics from logs
async function analyzePerformanceFromLogs(endpointId, hoursBack = 6) {
    const endTime = new Date();
    const startTime = new Date(endTime.getTime() - (hoursBack * 60 * 60 * 1000));
    
    const perfLogs = await client.endpoints.getLogs(endpointId, {
        startTime: startTime.toISOString(),
        endTime: endTime.toISOString(),
        search: 'duration|latency|performance|completed',
        level: 'info',
        limit: 10000
    });
    
    const durations = [];
    const memoryUsage = [];
    const gpuUtilization = [];
    
    perfLogs.logs.forEach(log => {
        if (log.duration) {
            durations.push(log.duration);
        }
        
        if (log.metadata) {
            // Extract memory usage
            if (log.metadata.memoryUsage && typeof log.metadata.memoryUsage === 'string') {
                const memoryMatch = log.metadata.memoryUsage.match(/(\d+\.?\d*)GB/);
                if (memoryMatch) {
                    memoryUsage.push(parseFloat(memoryMatch[1]));
                }
            }
            
            // Extract GPU utilization
            if (typeof log.metadata.gpuUtilization === 'number') {
                gpuUtilization.push(log.metadata.gpuUtilization);
            }
        }
    });
    
    const analysis = {
        durations: {
            count: durations.length,
            average: durations.length ? durations.reduce((a, b) => a + b, 0) / durations.length : 0,
            min: durations.length ? Math.min(...durations) : 0,
            max: durations.length ? Math.max(...durations) : 0,
            p95: durations.length ? percentile(durations, 95) : 0
        },
        memory: {
            count: memoryUsage.length,
            average: memoryUsage.length ? memoryUsage.reduce((a, b) => a + b, 0) / memoryUsage.length : 0,
            peak: memoryUsage.length ? Math.max(...memoryUsage) : 0
        },
        gpu: {
            count: gpuUtilization.length,
            average: gpuUtilization.length ? gpuUtilization.reduce((a, b) => a + b, 0) / gpuUtilization.length : 0,
            peak: gpuUtilization.length ? Math.max(...gpuUtilization) : 0
        }
    };
    
    console.log('Performance Analysis:');
    console.log(`  Average Duration: ${analysis.durations.average.toFixed(2)}s`);
    console.log(`  P95 Duration: ${analysis.durations.p95.toFixed(2)}s`);
    console.log(`  Peak Memory Usage: ${analysis.memory.peak.toFixed(1)}GB`);
    console.log(`  Average GPU Utilization: ${analysis.gpu.average.toFixed(1)}%`);
    
    return analysis;
}

// Helper function to calculate percentiles
function percentile(arr, p) {
    const sorted = [...arr].sort((a, b) => a - b);
    const index = (p / 100) * (sorted.length - 1);
    const lower = Math.floor(index);
    const upper = Math.ceil(index);
    
    if (lower === upper) {
        return sorted[lower];
    }
    
    const weight = index - lower;
    return sorted[lower] * (1 - weight) + sorted[upper] * weight;
}

// Log export functionality
async function exportLogs(endpointId, format = 'json', hoursBack = 24) {
    const endTime = new Date();
    const startTime = new Date(endTime.getTime() - (hoursBack * 60 * 60 * 1000));
    
    const logs = await client.endpoints.getLogs(endpointId, {
        startTime: startTime.toISOString(),
        endTime: endTime.toISOString(),
        format: format,
        limit: 10000
    });
    
    const filename = `${endpointId}_logs_${startTime.toISOString().split('T')[0]}.${format}`;
    
    if (format === 'json') {
        fs.writeFileSync(filename, JSON.stringify(logs, null, 2));
    } else {
        fs.writeFileSync(filename, logs);
    }
    
    console.log(`Logs exported to ${filename}`);
    return filename;
}

// Automated error alerting
class LogAlerting {
    constructor(endpointIds, options = {}) {
        this.endpointIds = endpointIds;
        this.checkInterval = options.checkInterval || 300000; // 5 minutes
        this.errorThreshold = options.errorThreshold || 5;
        this.callbacks = {
            onAlert: options.onAlert || this.defaultAlertHandler,
            onError: options.onError || console.error
        };
        this.isRunning = false;
        this.intervalId = null;
    }
    
    defaultAlertHandler(alerts) {
        console.log(`\n🚨 ERROR ALERTS - ${new Date().toISOString()}`);
        alerts.forEach(alert => {
            console.log(`Endpoint ${alert.endpointId}: ${alert.errorCount} new errors`);
            alert.recentErrors.slice(0, 3).forEach(error => {
                console.log(`  - ${error.message}`);
            });
        });
    }
    
    async checkForErrors() {
        const endTime = new Date();
        const startTime = new Date(endTime.getTime() - this.checkInterval);
        
        const alerts = [];
        
        for (const endpointId of this.endpointIds) {
            try {
                const errorLogs = await client.endpoints.getLogs(endpointId, {
                    startTime: startTime.toISOString(),
                    endTime: endTime.toISOString(),
                    level: 'error',
                    limit: 100
                });
                
                if (errorLogs.logs.length >= this.errorThreshold) {
                    alerts.push({
                        endpointId,
                        errorCount: errorLogs.logs.length,
                        recentErrors: errorLogs.logs
                    });
                }
                
            } catch (error) {
                this.callbacks.onError(`Error checking logs for ${endpointId}:`, error);
            }
        }
        
        if (alerts.length > 0) {
            this.callbacks.onAlert(alerts);
        }
        
        return alerts;
    }
    
    start() {
        if (this.isRunning) {
            console.log('Alerting is already running');
            return;
        }
        
        console.log(`Starting error monitoring for ${this.endpointIds.length} endpoints...`);
        console.log(`Check interval: ${this.checkInterval / 1000} seconds`);
        console.log(`Error threshold: ${this.errorThreshold} errors per interval`);
        
        this.isRunning = true;
        this.intervalId = setInterval(() => {
            this.checkForErrors();
        }, this.checkInterval);
        
        // Initial check
        this.checkForErrors();
    }
    
    stop() {
        if (!this.isRunning) {
            console.log('Alerting is not running');
            return;
        }
        
        console.log('Stopping error monitoring...');
        this.isRunning = false;
        
        if (this.intervalId) {
            clearInterval(this.intervalId);
            this.intervalId = null;
        }
    }
}

// Usage examples
async function main() {
    const endpointId = "ep_1234567890abcdef";
    const endpointIds = ["ep_1234567890abcdef", "ep_2345678901bcdefg"];
    
    try {
        // Basic log retrieval
        const logs = await getEndpointLogs(endpointId, 2);
        
        // Error analysis
        const errorAnalysis = await analyzeErrorPatterns(endpointId, 24);
        
        // Performance analysis
        const perfAnalysis = await analyzePerformanceFromLogs(endpointId, 6);
        
        // Export logs
        await exportLogs(endpointId, 'json', 12);
        
        // Set up automated alerting
        const alerting = new LogAlerting(endpointIds, {
            checkInterval: 300000, // 5 minutes
            errorThreshold: 3,
            onAlert: (alerts) => {
                // Custom alert handling
                console.log('Custom alert handler triggered!');
                alerts.forEach(alert => {
                    console.log(`🔥 ${alert.endpointId}: ${alert.errorCount} errors!`);
                });
            }
        });
        
        // Start alerting (uncomment to run)
        // alerting.start();
        
        // Real-time monitoring (uncomment to run)
        // monitorLogsRealtime(endpointId, {
        //     levelFilter: 'info',
        //     onLog: (log) => {
        //         console.log(`[${log.timestamp}] ${log.level}: ${log.message}`);
        //     }
        // });
        
    } catch (error) {
        console.error("Log analysis error:", error);
    }
}

main();

Use Cases

Production Debugging

  • Error Investigation: Quickly identify and analyze production errors
  • Performance Troubleshooting: Diagnose latency and throughput issues
  • Resource Problems: Monitor memory leaks and resource exhaustion
  • Integration Issues: Debug API calls and external service failures

Development and Testing

  • Development Debugging: Monitor application behavior during development
  • Load Testing: Analyze system behavior under load
  • Performance Optimization: Identify optimization opportunities
  • Quality Assurance: Verify correct application behavior

Operations and Monitoring

  • Real-time Monitoring: Monitor system health in real-time
  • Alerting Systems: Set up automated alerts for critical issues
  • Compliance Auditing: Maintain audit trails for compliance requirements
  • Capacity Planning: Analyze usage patterns for capacity planning

Business Intelligence

  • Usage Analytics: Understand user behavior and usage patterns
  • Performance Metrics: Track application performance over time
  • Cost Analysis: Analyze operational costs and optimization opportunities
  • Trend Analysis: Identify patterns and trends in application usage

Best Practices

Log Management

  • Structured Logging: Use structured log formats for easier analysis
  • Log Levels: Use appropriate log levels to control verbosity
  • Retention Policies: Define retention policies based on compliance requirements
  • Storage Optimization: Use appropriate storage tiers for different log types

Performance Considerations

  • Filtering: Use specific filters to reduce data transfer and processing
  • Pagination: Use pagination for large log sets to avoid timeouts
  • Streaming: Use streaming for real-time monitoring instead of polling
  • Caching: Cache log analysis results when appropriate

Security and Compliance

  • Access Control: Implement proper access controls for sensitive logs
  • Data Privacy: Ensure logs don’t contain sensitive personal information
  • Audit Trails: Maintain audit trails for log access and modifications
  • Encryption: Use encryption for sensitive log data

Monitoring and Alerting

  • Proactive Monitoring: Set up proactive monitoring for critical issues
  • Alert Thresholds: Set appropriate thresholds to avoid alert fatigue
  • Escalation Procedures: Define clear escalation procedures for different alert types
  • Integration: Integrate with existing monitoring and alerting systems
Logs are retained for 30 days by default. For longer retention, consider exporting logs to your own storage system or upgrading to a plan with extended retention.
Log streaming consumes resources and should be used judiciously. Close streaming connections when not needed to avoid unnecessary resource usage.
Use structured search patterns and metadata filtering to quickly find relevant logs. Consider setting up automated log analysis pipelines for common debugging scenarios.

Authorizations

Authorization
string
header
required

API key authentication. Use 'Bearer YOUR_API_KEY' format.

Path Parameters

endpointId
string
required

Query Parameters

lines
integer
default:100

Number of log lines to retrieve

Required range: x <= 1000

Response

200 - application/json

Endpoint logs

The response is of type object.