Core Technologies
Hypervisor VMM
hypervisor-vmm
is the low-level engine that powers TensorOne’s GPU Virtual Private Servers (VPS). It's designed to abstract bare-metal GPU instances into scalable, isolated compute environments optimized for high-throughput machine learning, AI inference, and custom containerized workloads.
What is a Virtual Machine Monitor (VMM)?
The Virtual Machine Monitor (VMM), or hypervisor, is a lightweight, high-efficiency layer responsible for:
- Virtualizing hardware-level GPU access
- Isolating cluster workloads
- Enforcing multi-tenant execution safety
- Ensuring container-to-GPU passthrough performance
Unlike traditional hypervisors, hypervisor-vmm
is container-native, tuned to operate with Ubuntu containers, direct NVLink access, and zero-penalty Docker image mounting.
Key Technologies
GPU Passthrough
TensorOne's clusters expose real NVIDIA GPUs (like A100, 3090, A6000) to containers through direct PCIe passthrough. The VMM ensures full CUDA compatibility, meaning zero emulation and maximum throughput.
[ Container ] ↔ [ hypervisor-vmm ] ↔ [ PCIe → Physical GPU ]
This enables:
- Full access to VRAM
- Compatibility with PyTorch, TensorFlow, etc.
- Real-time monitoring of GPU memory and utilization
Dynamic Resource Mapping
Each cluster is backed by a dynamic VM profile:
- vCPU: Isolated logical CPUs bound per container
- RAM: High-bandwidth DDR5 slices allocated on demand
- Storage:
- Ephemeral Container Disk (fast, non-persistent)
- Persistent Volume (durable, restarts safe)
Resources scale elastically via GraphQL or CLI with tensoronecli create clusters
.
Secure Multi-Tenant Scheduling
To ensure user isolation across shared infrastructure, hypervisor-vmm
leverages:
- AppArmor profiles per container
- Seccomp policies
- Encrypted proxy channels (TLS/SSL)
- Idle timeout termination
These safeguards make it possible to run multiple public-facing endpoints securely on shared GPU hosts.
Boot Flow Architecture
graph TD
A[Project Template] --> B[hypervisor-vmm]
B --> C[Container Provisioning]
C --> D[GPU & Disk Binding]
D --> E[Runtime Session]
E --> F[Web Proxy / Serverless Endpoint]
Every project deployment initializes a virtual environment from a template and connects through the hypervisor-vmm
to hardware resources.
Developer-Facing API
You can configure hypervisor-backed clusters via:
- GraphQL SDK:
clusterFindAndDeployOnDemand
,clusterRentInterruptable
- TensorOne CLI:
tensoronecli create clusters
,tensoronecli start cluster
- Environment variables like
TENSORONE_CLUSTER_ID
,TENSORONE_API_KEY
Optimized for Machine Learning
The hypervisor-vmm
was built with AI workloads in mind:
- Optimized I/O for model loading
- NVLink for large-model multi-GPU support
- Auto-scaler integration with TensorOne endpoints
Try it out
Launch a dev cluster now:
tensoronecli create clusters \
--gpuType "NVIDIA A100" \
--imageName "tensorone/llm-starter" \
--containerDiskSize 20 \
--volumeSize 40 \
--mem 32 \
--args "python run.py"
For more details on managing your clusters and infrastructure, see Manage Clusters and GraphQL Configuration.