tensorone logo

Core Technologies

Hypervisor VMM

hypervisor-vmm: GPU Virtualization Engine for TensorOne.

hypervisor-vmm is the low-level engine that powers TensorOne’s GPU Virtual Private Servers (VPS). It abstracts bare-metal GPU instances into scalable, isolated compute environments optimized for high-throughput machine learning (ML), AI inference, and containerized workloads.


TensorOne Image Asset

What is a Virtual Machine Monitor (VMM)?

A Virtual Machine Monitor (VMM), or hypervisor, is a lightweight, high-efficiency execution layer responsible for:

  • Virtualizing hardware-level GPU access
  • Isolating cluster workloads
  • Enforcing multi-tenant execution safety
  • Enabling container-to-GPU passthrough

Unlike traditional hypervisors, hypervisor-vmm is container-native, optimized for Ubuntu-based containers, direct NVLink passthrough, and near-zero Docker image mounting overhead.


Key Technologies

GPU Passthrough

TensorOne clusters use real NVIDIA GPUs (e.g., A100, RTX 3090, A6000) via direct PCIe passthrough:

[ Container ] ↔ [ hypervisor-vmm ] ↔ [ PCIe → Physical GPU ]

This configuration ensures:

  • Full CUDA compatibility
  • Maximum VRAM access
  • Compatibility with ML libraries (e.g., PyTorch, TensorFlow)
  • Real-time metrics: memory, utilization, temperature

Dynamic Resource Mapping

Each cluster is provisioned with a dynamic profile:

  • vCPU: Dedicated logical CPUs
  • RAM: DDR5 slices with bandwidth isolation
  • Storage:
    • Ephemeral Container Disk (non-persistent, fast I/O)
    • Persistent Volume (durable, reboot-safe)

Resources can be dynamically scaled via the GraphQL API or tensoronecli.

Secure Multi-Tenant Scheduling

To enforce secure, isolated execution across shared infrastructure:

  • AppArmor & Seccomp enforcement per container
  • Encrypted TLS proxy communication
  • Automatic idle timeouts & sandboxing

This allows multiple public endpoints to run safely on shared GPU hosts.


Boot Flow Architecture

graph TD
    A[Project Template] --> B[hypervisor-vmm]
    B --> C[Container Provisioning]
    C --> D[GPU & Disk Binding]
    D --> E[Runtime Session]
    E --> F[Web Proxy / Serverless Endpoint]

Each project deployment initializes from a template, attaches to physical GPU/disk resources via hypervisor-vmm, and is then exposed via runtime proxies.


Developer-Facing Interfaces

You can control hypervisor-backed compute environments through:

  • GraphQL SDK:

    • clusterFindAndDeployOnDemand
    • clusterRentInterruptable
  • TensorOne CLI:

    • tensoronecli create clusters
    • tensoronecli start cluster
  • Environment Variables:

    • TENSORONE_CLUSTER_ID
    • TENSORONE_API_KEY

Optimized for Machine Learning

The hypervisor-vmm is designed for ML workflows:

  • Optimized disk I/O for rapid model loading
  • NVLink-enabled for multi-GPU workloads
  • Integrated with endpoint auto-scalers

Try it Out

Launch a development cluster using the CLI:

tensoronecli create clusters \
  --gpuType "NVIDIA A100" \
  --imageName "tensorone/llm-starter" \
  --containerDiskSize 20 \
  --volumeSize 40 \
  --mem 32 \
  --args "python run.py"

For more details, refer to the Managing Clusters and GraphQL Configuration Reference

Previous
Python Endpoints