tensorone logo

Core Technologies

Hypervisor VMM

hypervisor-vmm is the low-level engine that powers TensorOne’s GPU Virtual Private Servers (VPS). It's designed to abstract bare-metal GPU instances into scalable, isolated compute environments optimized for high-throughput machine learning, AI inference, and custom containerized workloads.


What is a Virtual Machine Monitor (VMM)?

The Virtual Machine Monitor (VMM), or hypervisor, is a lightweight, high-efficiency layer responsible for:

  • Virtualizing hardware-level GPU access
  • Isolating cluster workloads
  • Enforcing multi-tenant execution safety
  • Ensuring container-to-GPU passthrough performance

Unlike traditional hypervisors, hypervisor-vmm is container-native, tuned to operate with Ubuntu containers, direct NVLink access, and zero-penalty Docker image mounting.


Key Technologies

GPU Passthrough

TensorOne's clusters expose real NVIDIA GPUs (like A100, 3090, A6000) to containers through direct PCIe passthrough. The VMM ensures full CUDA compatibility, meaning zero emulation and maximum throughput.

[ Container ] ↔ [ hypervisor-vmm ] ↔ [ PCIe → Physical GPU ]

This enables:

  • Full access to VRAM
  • Compatibility with PyTorch, TensorFlow, etc.
  • Real-time monitoring of GPU memory and utilization

Dynamic Resource Mapping

Each cluster is backed by a dynamic VM profile:

  • vCPU: Isolated logical CPUs bound per container
  • RAM: High-bandwidth DDR5 slices allocated on demand
  • Storage:
    • Ephemeral Container Disk (fast, non-persistent)
    • Persistent Volume (durable, restarts safe)

Resources scale elastically via GraphQL or CLI with tensoronecli create clusters.


Secure Multi-Tenant Scheduling

To ensure user isolation across shared infrastructure, hypervisor-vmm leverages:

  • AppArmor profiles per container
  • Seccomp policies
  • Encrypted proxy channels (TLS/SSL)
  • Idle timeout termination

These safeguards make it possible to run multiple public-facing endpoints securely on shared GPU hosts.


Boot Flow Architecture

graph TD
    A[Project Template] --> B[hypervisor-vmm]
    B --> C[Container Provisioning]
    C --> D[GPU & Disk Binding]
    D --> E[Runtime Session]
    E --> F[Web Proxy / Serverless Endpoint]

Every project deployment initializes a virtual environment from a template and connects through the hypervisor-vmm to hardware resources.


Developer-Facing API

You can configure hypervisor-backed clusters via:

  • GraphQL SDK: clusterFindAndDeployOnDemand, clusterRentInterruptable
  • TensorOne CLI: tensoronecli create clusters, tensoronecli start cluster
  • Environment variables like TENSORONE_CLUSTER_ID, TENSORONE_API_KEY

Optimized for Machine Learning

The hypervisor-vmm was built with AI workloads in mind:

  • Optimized I/O for model loading
  • NVLink for large-model multi-GPU support
  • Auto-scaler integration with TensorOne endpoints

Try it out

Launch a dev cluster now:

tensoronecli create clusters \
--gpuType "NVIDIA A100" \
--imageName "tensorone/llm-starter" \
--containerDiskSize 20 \
--volumeSize 40 \
--mem 32 \
--args "python run.py"

For more details on managing your clusters and infrastructure, see Manage Clusters and GraphQL Configuration.

Previous
Python Endpoints