List Models
Retrieve a list of trained models for your account.Query Parameters
type
: Filter by model type (llm
,vision
,multimodal
,custom
)status
: Filter by status (training
,ready
,deployed
,archived
)trainingJobId
: Filter by originating training joblimit
: Number of models to return (1-100, default: 50)offset
: Number of models to skip for paginationsort
: Sort order (created_at
,updated_at
,name
,size
)order
: Sort direction (asc
,desc
, default:desc
)
Response
Get Model Details
Retrieve detailed information about a specific model.Response
Update Model Metadata
Update model information, tags, and metadata.Create Model Version
Create a new version of an existing model from a training job.Deploy Model
Deploy a model to create a new inference endpoint.Response
Download Model
Download model files for local deployment or analysis.Response
Compare Models
Compare performance metrics between different models or versions.Response
Archive Model
Archive a model to reduce storage costs while maintaining metadata.SDK Examples
Python SDK
JavaScript SDK
Model Formats
PyTorch Models
- pytorch_model.bin: Model weights in PyTorch format
- config.json: Model architecture configuration
- tokenizer.json: Tokenizer configuration and vocabulary
Hugging Face Compatible
- model.safetensors: Safe tensor format for weights
- pytorch_model.bin: PyTorch weights (legacy)
- config.json: Transformers configuration
- tokenizer_config.json: Tokenizer configuration
ONNX Export
- model.onnx: ONNX format for cross-platform deployment
- config.json: Model metadata
- tokenizer.json: Tokenizer information
TensorRT Optimization
- model.trt: TensorRT optimized engine
- config.json: Optimization parameters
- profiling_data.json: Performance profiling results
Error Handling
Common Errors
Best Practices
Model Organization
- Use consistent naming conventions for models and versions
- Tag models with relevant metadata (domain, use case, quality)
- Maintain clear version histories with detailed changelogs
- Archive outdated models to reduce storage costs
Performance Optimization
- Choose appropriate deployment configurations based on latency requirements
- Use auto-scaling to handle variable workloads efficiently
- Monitor model performance metrics continuously
- Implement A/B testing for model comparisons
Security and Compliance
- Implement proper access controls for sensitive models
- Maintain audit trails for model deployments
- Use encryption for model files and communications
- Regular security scans for deployed models
Model deployments typically take 3-5 minutes to become active. Larger models may require additional time for optimization and loading.
Archived models can be restored within the retention period. After that, they are permanently deleted and cannot be recovered.
Authorizations
API key authentication. Use 'Bearer YOUR_API_KEY' format.
Query Parameters
Filter by model status
Available options:
training
, completed
, failed
, cancelled
Maximum number of models to return
Required range:
1 <= x <= 100
Response
200 - application/json
List of training models
The response is of type object
.