Update Endpoint Configuration

Update the configuration of an existing serverless endpoint to modify scaling parameters, worker count, environment variables, and other settings.

Path Parameters

endpointId: The unique identifier of the endpoint to update

Request Body

{
    "workerCount": 3,
    "maxConcurrency": 15,
    "timeoutSeconds": 600,
    "environmentVariables": {
        "MODEL_PRECISION": "fp16",
        "BATCH_SIZE": "4",
        "CACHE_SIZE": "2048"
    },
    "autoScaling": {
        "enabled": true,
        "minWorkers": 1,
        "maxWorkers": 10,
        "targetUtilization": 70
    }
}

Example Usage

Update Worker Count and Timeout

curl -X PUT "https://api.tensorone.ai/v2/endpoints/ep_1234567890abcdef" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "workerCount": 5,
    "timeoutSeconds": 900,
    "maxConcurrency": 20
  }'

Enable Auto-Scaling

curl -X PUT "https://api.tensorone.ai/v2/endpoints/ep_1234567890abcdef" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "autoScaling": {
      "enabled": true,
      "minWorkers": 2,
      "maxWorkers": 8,
      "targetUtilization": 75,
      "scaleUpCooldown": 60,
      "scaleDownCooldown": 300
    }
  }'

Update Environment Variables

curl -X PUT "https://api.tensorone.ai/v2/endpoints/ep_1234567890abcdef" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "environmentVariables": {
      "MODEL_VERSION": "v2.1",
      "ENABLE_LOGGING": "true",
      "MAX_BATCH_SIZE": "8"
    }
  }'

Response

Returns the updated endpoint configuration:

{
    "id": "ep_1234567890abcdef",
    "name": "my-text-generation-model",
    "status": "updating",
    "configuration": {
        "workerCount": 5,
        "maxConcurrency": 20,
        "timeoutSeconds": 900,
        "environmentVariables": {
            "MODEL_VERSION": "v2.1",
            "ENABLE_LOGGING": "true",
            "MAX_BATCH_SIZE": "8"
        },
        "autoScaling": {
            "enabled": true,
            "minWorkers": 2,
            "maxWorkers": 8,
            "targetUtilization": 75,
            "scaleUpCooldown": 60,
            "scaleDownCooldown": 300
        }
    },
    "updatedAt": "2024-01-15T14:30:00Z"
}

Configuration Parameters

Scaling Configuration

workerCount: Number of workers (1-50)
maxConcurrency: Maximum concurrent requests per worker
timeoutSeconds: Request timeout in seconds (10-3600)

Auto-Scaling Configuration

enabled: Enable/disable auto-scaling
minWorkers: Minimum number of workers
maxWorkers: Maximum number of workers
targetUtilization: Target CPU utilization percentage (10-90)
scaleUpCooldown: Cooldown period before scaling up (seconds)
scaleDownCooldown: Cooldown period before scaling down (seconds)

Environment Variables

Custom key-value pairs passed to your model runtime
Useful for model configuration, feature flags, and runtime parameters

Error Handling

400 Bad Request

{
    "error": "INVALID_CONFIGURATION",
    "message": "Worker count exceeds maximum limit",
    "details": {
        "field": "workerCount",
        "value": 100,
        "maximum": 50
    }
}

409 Conflict

{
    "error": "ENDPOINT_BUSY",
    "message": "Cannot update endpoint while executing requests",
    "details": {
        "activeRequests": 15,
        "suggestion": "Wait for active requests to complete or force update"
    }
}

SDK Examples

Python SDK

from tensorone import TensorOneClient

client = TensorOneClient(api_key="your_api_key")

# Update worker configuration
endpoint = client.endpoints.update(
    endpoint_id="ep_1234567890abcdef",
    worker_count=5,
    max_concurrency=20,
    timeout_seconds=900
)

# Enable auto-scaling
endpoint = client.endpoints.update(
    endpoint_id="ep_1234567890abcdef",
    auto_scaling={
        "enabled": True,
        "min_workers": 2,
        "max_workers": 8,
        "target_utilization": 75
    }
)

# Update environment variables
endpoint = client.endpoints.update(
    endpoint_id="ep_1234567890abcdef",
    environment_variables={
        "MODEL_VERSION": "v2.1",
        "ENABLE_CACHING": "true"
    }
)

JavaScript SDK

import { TensorOneClient } from "@tensorone/sdk";

const client = new TensorOneClient({ apiKey: "your_api_key" });

// Update scaling configuration
const endpoint = await client.endpoints.update("ep_1234567890abcdef", {
    workerCount: 5,
    maxConcurrency: 20,
    timeoutSeconds: 900,
});

// Enable auto-scaling with custom parameters
const scaledEndpoint = await client.endpoints.update("ep_1234567890abcdef", {
    autoScaling: {
        enabled: true,
        minWorkers: 2,
        maxWorkers: 8,
        targetUtilization: 75,
        scaleUpCooldown: 60,
        scaleDownCooldown: 300,
    },
});

Best Practices

Performance Optimization

Worker Count: Start with 2-3 workers and scale based on demand
Concurrency: Set maxConcurrency based on your model’s memory requirements
Timeouts: Use shorter timeouts for interactive applications, longer for batch processing
Auto-Scaling: Enable for variable workloads to optimize costs

Cost Optimization

Right-Sizing: Monitor utilization and adjust worker count accordingly
Auto-Scaling: Use to automatically scale down during low traffic periods
Environment Variables: Use to enable/disable expensive features dynamically

Deployment Strategy

Gradual Updates: Update configuration during low-traffic periods
Testing: Test configuration changes on development endpoints first
Monitoring: Monitor metrics after updates to ensure desired performance

Configuration updates are applied gradually to minimize service disruption. The endpoint status will show updating during the transition period.

Reducing worker count may temporarily increase latency as traffic redistributes. Plan updates during low-traffic periods.

Use environment variables to enable A/B testing by toggling features without redeploying your model.

Authorizations

Authorization

string

header

required

API key authentication. Use 'Bearer YOUR_API_KEY' format.

Path Parameters

endpointId

string

required

Body

application/json

Response

200 - application/json

Endpoint updated successfully

The response is of type object.

Getting Started

Account Management

GPU Clusters (VPS)

Serverless Endpoints

Managed Training

AI Services

Payment & Billing

Monitoring & Analytics

Update Endpoint Configuration

Path Parameters

Request Body

Example Usage

Update Worker Count and Timeout

Enable Auto-Scaling

Update Environment Variables

Response

Configuration Parameters

Scaling Configuration

Auto-Scaling Configuration

Environment Variables

Error Handling

400 Bad Request

409 Conflict

SDK Examples

Python SDK

JavaScript SDK

Best Practices

Performance Optimization

Cost Optimization

Deployment Strategy

Authorizations

Path Parameters

Body

Response

Getting Started

Account Management

GPU Clusters (VPS)

Serverless Endpoints

Managed Training

AI Services

Payment & Billing

Monitoring & Analytics

​Path Parameters

​Request Body

​Example Usage

​Update Worker Count and Timeout

​Enable Auto-Scaling

​Update Environment Variables

​Response

​Configuration Parameters

​Scaling Configuration

​Auto-Scaling Configuration

​Environment Variables

​Error Handling

​400 Bad Request

​409 Conflict

​SDK Examples

​Python SDK

​JavaScript SDK

​Best Practices

​Performance Optimization

​Cost Optimization

​Deployment Strategy

Authorizations

Path Parameters

Body

Response

Path Parameters

Request Body

Example Usage

Update Worker Count and Timeout

Enable Auto-Scaling

Update Environment Variables

Response

Configuration Parameters

Scaling Configuration

Auto-Scaling Configuration

Environment Variables

Error Handling

400 Bad Request

409 Conflict

SDK Examples

Python SDK

JavaScript SDK

Best Practices

Performance Optimization

Cost Optimization

Deployment Strategy