Update Endpoint
curl --request PUT \
  --url https://api.tensorone.ai/v2/endpoints/{endpointId} \
  --header 'Authorization: <api-key>' \
  --header 'Content-Type: application/json' \
  --data '{
  "name": "<string>",
  "workerCount": 5
}'
{
  "id": "<string>",
  "name": "<string>",
  "status": "active",
  "url": "<string>",
  "templateId": "<string>",
  "gpuType": "<string>",
  "createdAt": "2023-11-07T05:31:56Z",
  "updatedAt": "2023-11-07T05:31:56Z"
}
Update the configuration of an existing serverless endpoint to modify scaling parameters, worker count, environment variables, and other settings.

Path Parameters

  • endpointId: The unique identifier of the endpoint to update

Request Body

{
    "workerCount": 3,
    "maxConcurrency": 15,
    "timeoutSeconds": 600,
    "environmentVariables": {
        "MODEL_PRECISION": "fp16",
        "BATCH_SIZE": "4",
        "CACHE_SIZE": "2048"
    },
    "autoScaling": {
        "enabled": true,
        "minWorkers": 1,
        "maxWorkers": 10,
        "targetUtilization": 70
    }
}

Example Usage

Update Worker Count and Timeout

curl -X PUT "https://api.tensorone.ai/v2/endpoints/ep_1234567890abcdef" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "workerCount": 5,
    "timeoutSeconds": 900,
    "maxConcurrency": 20
  }'

Enable Auto-Scaling

curl -X PUT "https://api.tensorone.ai/v2/endpoints/ep_1234567890abcdef" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "autoScaling": {
      "enabled": true,
      "minWorkers": 2,
      "maxWorkers": 8,
      "targetUtilization": 75,
      "scaleUpCooldown": 60,
      "scaleDownCooldown": 300
    }
  }'

Update Environment Variables

curl -X PUT "https://api.tensorone.ai/v2/endpoints/ep_1234567890abcdef" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "environmentVariables": {
      "MODEL_VERSION": "v2.1",
      "ENABLE_LOGGING": "true",
      "MAX_BATCH_SIZE": "8"
    }
  }'

Response

Returns the updated endpoint configuration:
{
    "id": "ep_1234567890abcdef",
    "name": "my-text-generation-model",
    "status": "updating",
    "configuration": {
        "workerCount": 5,
        "maxConcurrency": 20,
        "timeoutSeconds": 900,
        "environmentVariables": {
            "MODEL_VERSION": "v2.1",
            "ENABLE_LOGGING": "true",
            "MAX_BATCH_SIZE": "8"
        },
        "autoScaling": {
            "enabled": true,
            "minWorkers": 2,
            "maxWorkers": 8,
            "targetUtilization": 75,
            "scaleUpCooldown": 60,
            "scaleDownCooldown": 300
        }
    },
    "updatedAt": "2024-01-15T14:30:00Z"
}

Configuration Parameters

Scaling Configuration

  • workerCount: Number of workers (1-50)
  • maxConcurrency: Maximum concurrent requests per worker
  • timeoutSeconds: Request timeout in seconds (10-3600)

Auto-Scaling Configuration

  • enabled: Enable/disable auto-scaling
  • minWorkers: Minimum number of workers
  • maxWorkers: Maximum number of workers
  • targetUtilization: Target CPU utilization percentage (10-90)
  • scaleUpCooldown: Cooldown period before scaling up (seconds)
  • scaleDownCooldown: Cooldown period before scaling down (seconds)

Environment Variables

  • Custom key-value pairs passed to your model runtime
  • Useful for model configuration, feature flags, and runtime parameters

Error Handling

400 Bad Request

{
    "error": "INVALID_CONFIGURATION",
    "message": "Worker count exceeds maximum limit",
    "details": {
        "field": "workerCount",
        "value": 100,
        "maximum": 50
    }
}

409 Conflict

{
    "error": "ENDPOINT_BUSY",
    "message": "Cannot update endpoint while executing requests",
    "details": {
        "activeRequests": 15,
        "suggestion": "Wait for active requests to complete or force update"
    }
}

SDK Examples

Python SDK

from tensorone import TensorOneClient

client = TensorOneClient(api_key="your_api_key")

# Update worker configuration
endpoint = client.endpoints.update(
    endpoint_id="ep_1234567890abcdef",
    worker_count=5,
    max_concurrency=20,
    timeout_seconds=900
)

# Enable auto-scaling
endpoint = client.endpoints.update(
    endpoint_id="ep_1234567890abcdef",
    auto_scaling={
        "enabled": True,
        "min_workers": 2,
        "max_workers": 8,
        "target_utilization": 75
    }
)

# Update environment variables
endpoint = client.endpoints.update(
    endpoint_id="ep_1234567890abcdef",
    environment_variables={
        "MODEL_VERSION": "v2.1",
        "ENABLE_CACHING": "true"
    }
)

JavaScript SDK

import { TensorOneClient } from "@tensorone/sdk";

const client = new TensorOneClient({ apiKey: "your_api_key" });

// Update scaling configuration
const endpoint = await client.endpoints.update("ep_1234567890abcdef", {
    workerCount: 5,
    maxConcurrency: 20,
    timeoutSeconds: 900,
});

// Enable auto-scaling with custom parameters
const scaledEndpoint = await client.endpoints.update("ep_1234567890abcdef", {
    autoScaling: {
        enabled: true,
        minWorkers: 2,
        maxWorkers: 8,
        targetUtilization: 75,
        scaleUpCooldown: 60,
        scaleDownCooldown: 300,
    },
});

Best Practices

Performance Optimization

  • Worker Count: Start with 2-3 workers and scale based on demand
  • Concurrency: Set maxConcurrency based on your model’s memory requirements
  • Timeouts: Use shorter timeouts for interactive applications, longer for batch processing
  • Auto-Scaling: Enable for variable workloads to optimize costs

Cost Optimization

  • Right-Sizing: Monitor utilization and adjust worker count accordingly
  • Auto-Scaling: Use to automatically scale down during low traffic periods
  • Environment Variables: Use to enable/disable expensive features dynamically

Deployment Strategy

  • Gradual Updates: Update configuration during low-traffic periods
  • Testing: Test configuration changes on development endpoints first
  • Monitoring: Monitor metrics after updates to ensure desired performance
Configuration updates are applied gradually to minimize service disruption. The endpoint status will show updating during the transition period.
Reducing worker count may temporarily increase latency as traffic redistributes. Plan updates during low-traffic periods.
Use environment variables to enable A/B testing by toggling features without redeploying your model.

Authorizations

Authorization
string
header
required

API key authentication. Use 'Bearer YOUR_API_KEY' format.

Path Parameters

endpointId
string
required

Body

application/json

Response

200 - application/json

Endpoint updated successfully

The response is of type object.