Overview

TensorOne API implements rate limiting to ensure fair usage and maintain service quality for all users. Rate limits vary by endpoint type, subscription plan, and API key permissions.

Rate Limit Tiers

Free Tier

  • Read Operations: 100 requests/hour
  • Write Operations: 20 requests/hour
  • AI Services: 50 requests/hour
  • Training Jobs: 5 jobs/day

Pro Tier

  • Read Operations: 1,000 requests/hour
  • Write Operations: 200 requests/hour
  • AI Services: 500 requests/hour
  • Training Jobs: 20 jobs/day

Enterprise Tier

  • Read Operations: 10,000 requests/hour
  • Write Operations: 2,000 requests/hour
  • AI Services: 5,000 requests/hour
  • Training Jobs: Unlimited

Rate Limit Headers

Every API response includes rate limit information:
HTTP/1.1 200 OK
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 999
X-RateLimit-Reset: 1640998800
X-RateLimit-Window: 3600
Retry-After: 60

Header Descriptions

  • X-RateLimit-Limit: Maximum requests allowed in the time window
  • X-RateLimit-Remaining: Requests remaining in current window
  • X-RateLimit-Reset: Unix timestamp when the rate limit resets
  • X-RateLimit-Window: Rate limit window in seconds
  • Retry-After: Seconds to wait before making another request (when rate limited)

Endpoint-Specific Limits

Account Management

GET /accounts/* - 500/hour
POST /accounts/* - 50/hour
PUT /accounts/* - 50/hour
DELETE /accounts/* - 10/hour

GPU Clusters

GET /clusters/* - 1000/hour
POST /clusters - 20/hour (creation)
PUT /clusters/* - 100/hour
DELETE /clusters/* - 20/hour

Serverless Endpoints

GET /endpoints/* - 1000/hour
POST /endpoints - 50/hour (creation)
POST /endpoints/*/execute - 500/hour (execution)

AI Services

POST /ai/chat - 200/hour
POST /ai/text-to-image - 100/hour
POST /ai/text-to-video - 20/hour
POST /ai/text-to-speech - 150/hour

Training Jobs

GET /training/* - 500/hour
POST /training/jobs - 10/hour
PUT /training/* - 50/hour

Rate Limit Strategies

1. Exponential Backoff

Implement exponential backoff when rate limited:
import time
import random

def make_request_with_backoff(func, max_retries=5):
    for attempt in range(max_retries):
        try:
            response = func()
            if response.status_code != 429:
                return response
        except Exception as e:
            if attempt == max_retries - 1:
                raise e

        # Exponential backoff with jitter
        delay = (2 ** attempt) + random.uniform(0, 1)
        time.sleep(delay)

    raise Exception("Max retries exceeded")

2. Request Batching

Batch multiple operations when possible:
# Instead of multiple single requests
curl -X POST "https://api.tensorone.ai/v2/endpoints/ep1/execute" -d '...'
curl -X POST "https://api.tensorone.ai/v2/endpoints/ep2/execute" -d '...'

# Use batch execution
curl -X POST "https://api.tensorone.ai/v2/endpoints/batch-execute" \
  -d '{
    "requests": [
      {"endpoint_id": "ep1", "input": {...}},
      {"endpoint_id": "ep2", "input": {...}}
    ]
  }'

3. Caching

Cache responses when appropriate:
import requests
from functools import lru_cache
import time

@lru_cache(maxsize=128)
def get_cluster_info(cluster_id, cache_time):
    # cache_time parameter ensures cache expires
    response = requests.get(f"https://api.tensorone.ai/v2/clusters/{cluster_id}")
    return response.json()

# Use with current hour to cache for 1 hour
current_hour = int(time.time() // 3600)
cluster_info = get_cluster_info("cluster_123", current_hour)

Monitoring Rate Limits

Check Current Usage

curl -X GET "https://api.tensorone.ai/v2/auth/rate-limits" \
  -H "Authorization: Bearer YOUR_API_KEY"
Response:
{
    "limits": {
        "read_operations": {
            "limit": 1000,
            "remaining": 847,
            "reset_at": "2024-01-15T11:00:00Z"
        },
        "write_operations": {
            "limit": 200,
            "remaining": 195,
            "reset_at": "2024-01-15T11:00:00Z"
        },
        "ai_services": {
            "limit": 500,
            "remaining": 423,
            "reset_at": "2024-01-15T11:00:00Z"
        }
    }
}

Usage Analytics

curl -X GET "https://api.tensorone.ai/v2/analytics/api-usage?period=24h" \
  -H "Authorization: Bearer YOUR_API_KEY"

Rate Limit Errors

429 Too Many Requests

{
    "error": "RATE_LIMIT_EXCEEDED",
    "message": "Rate limit exceeded for endpoint",
    "code": 429,
    "details": {
        "limit": 100,
        "remaining": 0,
        "reset_at": "2024-01-15T11:00:00Z",
        "retry_after": 1800
    }
}

Handling in Code

async function makeAPIRequest(url, options) {
    try {
        const response = await fetch(url, options);

        if (response.status === 429) {
            const retryAfter = response.headers.get("Retry-After");
            console.log(`Rate limited. Retry after ${retryAfter} seconds`);

            // Wait and retry
            await new Promise((resolve) => setTimeout(resolve, retryAfter * 1000));
            return makeAPIRequest(url, options);
        }

        return response;
    } catch (error) {
        console.error("API request failed:", error);
        throw error;
    }
}

Optimization Tips

1. Use Appropriate HTTP Methods

  • Use HEAD requests to check resource existence
  • Use PATCH instead of PUT for partial updates
  • Implement conditional requests with If-Modified-Since

2. Optimize Polling

# Instead of constant polling
while True:
    status = get_job_status(job_id)
    if status == 'completed':
        break
    time.sleep(5)  # Wastes rate limit

# Use progressive intervals
def wait_for_completion(job_id):
    intervals = [5, 10, 30, 60]  # Progressive backoff
    interval_index = 0

    while True:
        status = get_job_status(job_id)
        if status == 'completed':
            return

        interval = intervals[min(interval_index, len(intervals) - 1)]
        time.sleep(interval)
        interval_index += 1

3. Request Prioritization

Use priority headers for critical requests:
curl -X POST "https://api.tensorone.ai/v2/endpoints/critical/execute" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "X-Priority: high" \
  -d '...'

Increasing Rate Limits

Upgrade Your Plan

Higher tier plans come with increased rate limits:
  • Pro Plan: 5x increase across all endpoints
  • Enterprise Plan: 50x increase with custom limits available

Request Limit Increase

For specific use cases, contact support with:
  • Expected request volume
  • Use case description
  • Timeline requirements
  • Current plan tier

Temporary Limit Boosts

For events or migrations:
curl -X POST "https://api.tensorone.ai/v2/auth/temporary-boost" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "multiplier": 2,
    "duration": "24h",
    "reason": "Data migration"
  }'
Consider using webhooks instead of polling to reduce API calls and stay within rate limits.