Hypervisor VMM

hypervisor-vmm: GPU Virtualization Engine for TensorOne.

hypervisor-vmm is the low-level engine that powers TensorOne’s GPU Virtual Private Servers (VPS). It abstracts bare-metal GPU instances into scalable, isolated compute environments optimized for high-throughput machine learning (ML), AI inference, and containerized workloads.

TensorOne Image Asset

What is a Virtual Machine Monitor (VMM)?

A Virtual Machine Monitor (VMM), or hypervisor, is a lightweight, high-efficiency execution layer responsible for:

Virtualizing hardware-level GPU access
Isolating cluster workloads
Enforcing multi-tenant execution safety
Enabling container-to-GPU passthrough

Unlike traditional hypervisors, hypervisor-vmm is container-native, optimized for Ubuntu-based containers, direct NVLink passthrough, and near-zero Docker image mounting overhead.

Key Technologies

GPU Passthrough

TensorOne clusters use real NVIDIA GPUs (e.g., A100, RTX 3090, A6000) via direct PCIe passthrough:

[ Container ] ↔ [ hypervisor-vmm ] ↔ [ PCIe → Physical GPU ]

This configuration ensures:

Full CUDA compatibility
Maximum VRAM access
Compatibility with ML libraries (e.g., PyTorch, TensorFlow)
Real-time metrics: memory, utilization, temperature

Dynamic Resource Mapping

Each cluster is provisioned with a dynamic profile:

vCPU: Dedicated logical CPUs
RAM: DDR5 slices with bandwidth isolation
Storage:
- Ephemeral Container Disk (non-persistent, fast I/O)
- Persistent Volume (durable, reboot-safe)

Resources can be dynamically scaled via the GraphQL API or tensoronecli.

Secure Multi-Tenant Scheduling

To enforce secure, isolated execution across shared infrastructure:

AppArmor & Seccomp enforcement per container
Encrypted TLS proxy communication
Automatic idle timeouts & sandboxing

This allows multiple public endpoints to run safely on shared GPU hosts.

Boot Flow Architecture

graph TD
    A[Project Template] --> B[hypervisor-vmm]
    B --> C[Container Provisioning]
    C --> D[GPU & Disk Binding]
    D --> E[Runtime Session]
    E --> F[Web Proxy / Serverless Endpoint]

Each project deployment initializes from a template, attaches to physical GPU/disk resources via hypervisor-vmm, and is then exposed via runtime proxies.

Developer-Facing Interfaces

You can control hypervisor-backed compute environments through:

GraphQL SDK:
- clusterFindAndDeployOnDemand
- clusterRentInterruptable
TensorOne CLI:
- tensoronecli create clusters
- tensoronecli start cluster
Environment Variables:
- TENSORONE_CLUSTER_ID
- TENSORONE_API_KEY

Optimized for Machine Learning

The hypervisor-vmm is designed for ML workflows:

Optimized disk I/O for rapid model loading
NVLink-enabled for multi-GPU workloads
Integrated with endpoint auto-scalers

Try it Out

Launch a development cluster using the CLI:

tensoronecli create clusters \
  --gpuType "NVIDIA A100" \
  --imageName "tensorone/llm-starter" \
  --containerDiskSize 20 \
  --volumeSize 40 \
  --mem 32 \
  --args "python run.py"

For more details, refer to the Managing Clusters and GraphQL Configuration Reference