List Cluster Templates
curl --request GET \
  --url https://api.tensorone.ai/v2/clusters/templates \
  --header 'Authorization: <api-key>'
[
  {
    "id": "<string>",
    "name": "<string>",
    "description": "<string>",
    "imageName": "<string>",
    "category": "official",
    "gpuRequired": true
  }
]

Overview

Cluster Templates provide pre-configured environments for consistent, repeatable cluster deployments. Templates include Docker images, environment variables, system configurations, and resource specifications that can be used to quickly spin up standardized development, training, or production environments.

Endpoints

List Templates

GET https://api.tensorone.ai/v1/clusters/templates

Get Template Details

GET https://api.tensorone.ai/v1/clusters/templates/{template_id}

Create Template

POST https://api.tensorone.ai/v1/clusters/templates

Update Template

PUT https://api.tensorone.ai/v1/clusters/templates/{template_id}

Delete Template

DELETE https://api.tensorone.ai/v1/clusters/templates/{template_id}

List Templates

Query Parameters

ParameterTypeRequiredDescription
categorystringNoFilter by category: ml, dev, production, custom
frameworkstringNoFilter by ML framework: pytorch, tensorflow, huggingface, sklearn
gpu_compatiblebooleanNoFilter GPU-compatible templates
officialbooleanNoFilter official TensorOne templates
project_idstringNoFilter by project (for custom templates)
searchstringNoSearch templates by name or description
sort_bystringNoSort by: name, created_at, usage_count, rating
include_deprecatedbooleanNoInclude deprecated templates (default: false)

Request Examples

# List all ML templates
curl -X GET "https://api.tensorone.ai/v1/clusters/templates?category=ml&framework=pytorch" \
  -H "Authorization: Bearer YOUR_API_KEY"

# Search for Jupyter templates
curl -X GET "https://api.tensorone.ai/v1/clusters/templates?search=jupyter&gpu_compatible=true" \
  -H "Authorization: Bearer YOUR_API_KEY"

Create Template

Request Body

ParameterTypeRequiredDescription
namestringYesTemplate name (unique within project)
descriptionstringYesTemplate description
categorystringYesTemplate category
docker_imagestringYesBase Docker image
frameworkstringNoML framework if applicable
default_configurationobjectYesDefault hardware configuration
environment_variablesobjectNoDefault environment variables
startup_scriptstringNoScript to run on cluster start
port_mappingsarrayNoDefault port configurations
required_packagesarrayNoAdditional packages to install
gpu_compatiblebooleanNoWhether template supports GPUs
min_resourcesobjectNoMinimum resource requirements
max_resourcesobjectNoMaximum resource limits
tagsarrayNoTemplate tags for organization
is_publicbooleanNoMake template publicly available (default: false)

Request Examples

# Create PyTorch training template
curl -X POST "https://api.tensorone.ai/v1/clusters/templates" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "pytorch-distributed-training",
    "description": "PyTorch distributed training environment with NCCL support",
    "category": "ml",
    "framework": "pytorch", 
    "docker_image": "pytorch/pytorch:2.2-cuda12.1-devel",
    "gpu_compatible": true,
    "default_configuration": {
      "gpu_type": "A100",
      "gpu_count": 4,
      "cpu_cores": 32,
      "memory_gb": 256,
      "storage_gb": 1000
    },
    "environment_variables": {
      "NCCL_DEBUG": "INFO",
      "CUDA_VISIBLE_DEVICES": "0,1,2,3",
      "MASTER_ADDR": "localhost",
      "MASTER_PORT": "12355",
      "WORLD_SIZE": "4"
    },
    "startup_script": "#!/bin/bash\necho \"Starting distributed training environment\"\nnvidia-smi\npython -c \"import torch; print(f\\\"PyTorch version: {torch.__version__}\\\")\"",
    "port_mappings": [
      {
        "internal_port": 6006,
        "protocol": "tcp",
        "description": "TensorBoard"
      },
      {
        "internal_port": 8888,
        "protocol": "tcp", 
        "description": "Jupyter Lab"
      }
    ],
    "required_packages": [
      "tensorboard",
      "wandb",
      "transformers",
      "datasets"
    ],
    "min_resources": {
      "gpu_count": 1,
      "memory_gb": 32,
      "storage_gb": 100
    },
    "tags": ["pytorch", "distributed", "training", "gpu"],
    "is_public": false
  }'

Get Template Details

curl -X GET "https://api.tensorone.ai/v1/clusters/templates/tmpl_abc123" \
  -H "Authorization: Bearer YOUR_API_KEY"

Template Response Schema

{
  "success": true,
  "data": {
    "id": "tmpl_abc123",
    "name": "pytorch-distributed-training",
    "description": "PyTorch distributed training environment with NCCL support", 
    "category": "ml",
    "framework": "pytorch",
    "version": "2.1.0",
    "docker_image": "pytorch/pytorch:2.2-cuda12.1-devel",
    "gpu_compatible": true,
    "official": false,
    "default_configuration": {
      "gpu_type": "A100",
      "gpu_count": 4,
      "cpu_cores": 32,
      "memory_gb": 256,
      "storage_gb": 1000,
      "estimated_hourly_cost": 10.00
    },
    "environment_variables": {
      "NCCL_DEBUG": "INFO",
      "CUDA_VISIBLE_DEVICES": "0,1,2,3",
      "MASTER_ADDR": "localhost",
      "MASTER_PORT": "12355",
      "WORLD_SIZE": "4"
    },
    "startup_script": "#!/bin/bash\necho \"Starting distributed training environment\"\nnvidia-smi\npython -c \"import torch; print(f\\\"PyTorch version: {torch.__version__}\\\")\"",
    "port_mappings": [
      {
        "internal_port": 6006,
        "protocol": "tcp",
        "description": "TensorBoard",
        "required": false
      },
      {
        "internal_port": 8888,
        "protocol": "tcp",
        "description": "Jupyter Lab",
        "required": true
      }
    ],
    "required_packages": [
      "tensorboard",
      "wandb", 
      "transformers",
      "datasets"
    ],
    "software_versions": {
      "python": "3.9",
      "pytorch": "2.2.0",
      "cuda": "12.1",
      "cudnn": "8.8"
    },
    "resource_limits": {
      "min_resources": {
        "gpu_count": 1,
        "cpu_cores": 8,
        "memory_gb": 32,
        "storage_gb": 100
      },
      "max_resources": {
        "gpu_count": 8,
        "cpu_cores": 128,
        "memory_gb": 1024,
        "storage_gb": 10000
      }
    },
    "compatibility": {
      "gpu_types": ["A100", "H100", "RTX4090", "V100"],
      "regions": ["us-east-1", "us-west-2", "eu-west-1"],
      "min_driver_version": "520.61.05"
    },
    "usage_statistics": {
      "total_deployments": 1247,
      "active_clusters": 34,
      "average_rating": 4.7,
      "success_rate": 98.3
    },
    "tags": ["pytorch", "distributed", "training", "gpu"],
    "created_by": {
      "user_id": "user_456",
      "username": "ml_engineer",
      "organization": "TensorOne"
    },
    "created_at": "2024-01-10T09:00:00Z",
    "updated_at": "2024-01-14T15:30:00Z",
    "is_public": false,
    "is_deprecated": false
  }
}

Use Cases

Standardized Development Environment

Create consistent development environments across teams.
def create_team_dev_template(team_name, requirements):
    """Create standardized development template for a team"""
    
    template_config = {
        "name": f"{team_name}-dev-environment",
        "description": f"Standardized development environment for {team_name} team",
        "category": "dev",
        "docker_image": requirements.get("base_image", "ubuntu:22.04"),
        "gpu_compatible": requirements.get("needs_gpu", False),
        "default_configuration": {
            "gpu_type": requirements.get("gpu_type", "RTX4090"),
            "gpu_count": 1 if requirements.get("needs_gpu") else 0,
            "cpu_cores": requirements.get("cpu_cores", 8),
            "memory_gb": requirements.get("memory_gb", 32),
            "storage_gb": requirements.get("storage_gb", 200)
        },
        "environment_variables": {
            "TEAM": team_name,
            "ENVIRONMENT": "development",
            **requirements.get("env_vars", {})
        },
        "startup_script": f"""#!/bin/bash
echo "Setting up {team_name} development environment"

# Install team-specific tools
{requirements.get("setup_script", "")}

# Set up workspace
mkdir -p /workspace/{team_name}
cd /workspace/{team_name}

echo "Environment ready!"
""",
        "port_mappings": requirements.get("ports", [
            {"internal_port": 8888, "protocol": "tcp", "description": "Jupyter Lab"},
            {"internal_port": 8080, "protocol": "tcp", "description": "Development Server"}
        ]),
        "required_packages": requirements.get("packages", []),
        "tags": [team_name, "development", "team-standard"],
        "is_public": False
    }
    
    response = requests.post(
        "https://api.tensorone.ai/v1/clusters/templates",
        headers={"Authorization": f"Bearer {API_KEY}"},
        json=template_config
    )
    
    return response.json()

# Create data science team template
ds_requirements = {
    "base_image": "jupyter/tensorflow-notebook:latest",
    "needs_gpu": True,
    "gpu_type": "RTX4090",
    "cpu_cores": 16,
    "memory_gb": 64,
    "storage_gb": 500,
    "env_vars": {
        "JUPYTER_ENABLE_LAB": "yes",
        "DATA_PATH": "/workspace/data"
    },
    "setup_script": """
pip install --upgrade pip
pip install pandas numpy scikit-learn matplotlib seaborn plotly
pip install wandb mlflow optuna
pip install tensorflow tensorflow-gpu torch torchvision
""",
    "packages": ["pandas", "numpy", "scikit-learn", "wandb", "mlflow"]
}

ds_template = create_team_dev_template("data-science", ds_requirements)

Production Inference Template

Create optimized templates for production model serving.
async function createInferenceTemplate(modelInfo) {
  const template = {
    name: `${modelInfo.name}-inference`,
    description: `Production inference template for ${modelInfo.name} model`,
    category: 'production',
    framework: modelInfo.framework,
    docker_image: modelInfo.dockerImage,
    gpu_compatible: true,
    default_configuration: {
      gpu_type: modelInfo.gpuType || 'T4',
      gpu_count: modelInfo.gpuCount || 1,
      cpu_cores: modelInfo.cpuCores || 8,
      memory_gb: modelInfo.memoryGb || 32,
      storage_gb: modelInfo.storageGb || 100
    },
    environment_variables: {
      MODEL_NAME: modelInfo.name,
      MODEL_VERSION: modelInfo.version,
      BATCH_SIZE: modelInfo.batchSize?.toString() || '8',
      MAX_SEQUENCE_LENGTH: modelInfo.maxSeqLength?.toString() || '512',
      INFERENCE_MODE: 'production',
      ...modelInfo.environmentVars
    },
    startup_script: `#!/bin/bash
echo "Starting ${modelInfo.name} inference server"

# Download model if needed
if [ ! -d "/workspace/models/${modelInfo.name}" ]; then
    echo "Downloading model..."
    ${modelInfo.downloadScript || 'echo "No download script provided"'}
fi

# Start inference server
echo "Starting inference server..."
${modelInfo.startCommand || 'python app.py'}
`,
    port_mappings: [
      {
        internal_port: 8000,
        protocol: 'tcp',
        description: 'Inference API',
        required: true
      },
      {
        internal_port: 8080,
        protocol: 'tcp', 
        description: 'Health Check',
        required: true
      }
    ],
    required_packages: modelInfo.requiredPackages || [],
    min_resources: {
      cpu_cores: 4,
      memory_gb: 16,
      storage_gb: 50
    },
    max_resources: {
      gpu_count: 4,
      cpu_cores: 32,
      memory_gb: 128,
      storage_gb: 1000
    },
    tags: [modelInfo.name, 'inference', 'production', modelInfo.framework],
    is_public: modelInfo.isPublic || false
  };
  
  const response = await fetch('https://api.tensorone.ai/v1/clusters/templates', {
    method: 'POST',
    headers: {
      'Authorization': 'Bearer YOUR_API_KEY',
      'Content-Type': 'application/json'
    },
    body: JSON.stringify(template)
  });
  
  return await response.json();
}

// Create BERT inference template
const bertInference = await createInferenceTemplate({
  name: 'bert-base-uncased',
  framework: 'huggingface',
  dockerImage: 'huggingface/transformers-pytorch-gpu:latest',
  gpuType: 'T4',
  gpuCount: 1,
  cpuCores: 8,
  memoryGb: 32,
  storageGb: 100,
  batchSize: 16,
  maxSeqLength: 512,
  environmentVars: {
    HF_MODEL_ID: 'bert-base-uncased',
    TOKENIZER_PARALLELISM: 'false'
  },
  downloadScript: 'huggingface-cli download bert-base-uncased --local-dir /workspace/models/bert-base-uncased',
  startCommand: 'python inference_server.py --model-path /workspace/models/bert-base-uncased --port 8000',
  requiredPackages: ['transformers', 'torch', 'fastapi', 'uvicorn']
});

Template Versioning and Updates

Manage template versions and updates.
def update_template_version(template_id, updates, version_notes):
    """Update template with version control"""
    
    # Get current template
    current_template = requests.get(
        f"https://api.tensorone.ai/v1/clusters/templates/{template_id}",
        headers={"Authorization": f"Bearer {API_KEY}"}
    ).json()
    
    if not current_template["success"]:
        return current_template
    
    # Increment version
    current_version = current_template["data"]["version"]
    major, minor, patch = map(int, current_version.split('.'))
    
    if updates.get("breaking_changes"):
        new_version = f"{major + 1}.0.0"
    elif updates.get("new_features"):
        new_version = f"{major}.{minor + 1}.0"
    else:
        new_version = f"{major}.{minor}.{patch + 1}"
    
    # Prepare update payload
    update_payload = {
        "version": new_version,
        "version_notes": version_notes,
        **updates
    }
    
    # Remove None values
    update_payload = {k: v for k, v in update_payload.items() if v is not None}
    
    response = requests.put(
        f"https://api.tensorone.ai/v1/clusters/templates/{template_id}",
        headers={"Authorization": f"Bearer {API_KEY}"},
        json=update_payload
    )
    
    result = response.json()
    
    if result["success"]:
        print(f"Template updated to version {new_version}")
        print(f"Previous version: {current_version}")
        print(f"Update notes: {version_notes}")
    
    return result

# Update PyTorch template with new packages
pytorch_updates = {
    "docker_image": "pytorch/pytorch:2.3-cuda12.1-devel",
    "required_packages": [
        "tensorboard", "wandb", "transformers", "datasets", 
        "accelerate", "deepspeed"  # New packages
    ],
    "software_versions": {
        "python": "3.10",
        "pytorch": "2.3.0", 
        "cuda": "12.1",
        "cudnn": "8.9"
    },
    "new_features": True
}

update_result = update_template_version(
    "tmpl_pytorch_distributed",
    pytorch_updates,
    "Updated to PyTorch 2.3 with DeepSpeed and Accelerate support"
)

Template Management Best Practices

Template Organization

def organize_templates_by_use_case():
    """Organize templates by use case and maintain consistency"""
    
    use_case_templates = {
        "ml_training": {
            "pytorch_distributed": "tmpl_pytorch_dist_123",
            "tensorflow_multi_gpu": "tmpl_tf_multi_456", 
            "huggingface_fine_tune": "tmpl_hf_tune_789"
        },
        "development": {
            "jupyter_data_science": "tmpl_jupyter_ds_abc",
            "vscode_remote": "tmpl_vscode_def",
            "rstudio_gpu": "tmpl_rstudio_ghi"
        },
        "production_inference": {
            "fastapi_model_server": "tmpl_fastapi_jkl",
            "triton_inference": "tmpl_triton_mno",
            "tensorrt_optimized": "tmpl_trt_pqr"
        }
    }
    
    # Validate all templates exist and are up to date
    for use_case, templates in use_case_templates.items():
        print(f"\n{use_case.upper()} Templates:")
        for name, template_id in templates.items():
            template = get_template_details(template_id)
            if template["success"]:
                data = template["data"]
                print(f"  ✅ {name}: v{data['version']} ({data['usage_statistics']['total_deployments']} deployments)")
            else:
                print(f"  ❌ {name}: Template not found or error")
    
    return use_case_templates

Error Handling

{
  "success": false,
  "error": {
    "code": "TEMPLATE_NOT_FOUND",
    "message": "Template with ID 'tmpl_invalid' not found",
    "details": {
      "template_id": "tmpl_invalid",
      "suggestion": "Check template ID or search available templates"
    }
  }
}

Security Considerations

  • Docker Image Security: Only use trusted Docker images from verified sources
  • Environment Variables: Never store secrets in templates; use the secrets management system
  • Public Templates: Carefully review public templates before use
  • Access Control: Restrict template modification permissions appropriately

Best Practices

  1. Version Control: Always increment versions for template updates
  2. Documentation: Include comprehensive descriptions and usage examples
  3. Testing: Test templates thoroughly before making them available to teams
  4. Resource Optimization: Set appropriate resource limits to prevent overprovisioning
  5. Standardization: Use templates to enforce consistent environments across projects
  6. Maintenance: Regularly update templates with security patches and dependency updates

Authorizations

Authorization
string
header
required

API key authentication. Use 'Bearer YOUR_API_KEY' format.

Response

200 - application/json

List of cluster templates

The response is of type object[].