tensorone logo

Experimental Projects

Text to Video

Our Text to Video project explores the frontier of prompt-driven motion generation—from abstract animations to coherent cinematic sequences, with frame-level and motion-level control.

We're actively experimenting with diffusion-based video models, contributing training data, testing inference chains, and building tooling for temporally-consistent generation workflows.


Core Areas of Focus

1. Model Research & Evaluation

We evaluate and contribute across major open-source and research models:

  • ModelScope T2V: Early-stage text2video diffusion
  • Zeroscope: SD-based 576x320 and 1024x576 resolution video models
  • Pika & CogVideo: Prompt-to-video with multilingual support
  • AnimateDiff: Motion module layered on top of image generation
  • VideoCrafter: Transformer-aware video diffusion

We run comparative tests on:

  • Motion smoothness
  • Prompt-object consistency
  • Loopability
  • Speed vs. quality tradeoffs

2. Multi-Stage Generation Pipelines

To improve quality and control, we chain together:

  1. Text → Keyframes (with Stable Diffusion or SDXL)
  2. Keyframes → Interpolation (using RIFE or FILM)
  3. Motion-aware resynthesis (via AnimateDiff or ControlNet)
  4. Optional: audio-to-sync for speech-matched video

We deploy this workflow using TensorOne Clusters to parallelize each stage.


3. LoRA and Motion Module Training

We've contributed custom LoRA adapters for motion styles like:

  • cyberpunk-tracking: stylized tracking camera motion
  • 3d-turntable-spin: full 360° slow pan
  • anime-fight-loop: fast-cut, jittery action sequences

And trained motion-aware AnimateDiff modules using:

  • Cinemagraph datasets
  • TikTok-style clips with embedded captions
  • Storyboard-to-animation transitions

Tooling and Infra

We maintain internal tools for:

  • Prompt-to-script breakdown
  • Batch video rendering across cluster queues
  • Scene interpolation validation
  • Video embedding indexing (CLIP + motion embeddings)

Sample job command:

tensoronecli create clusters --gpuType "A100" --imageName "text2video-train" --args "bash run_pipeline.sh"

Experimental Outputs

Some of our early successful generations include:

  • “A drone flying through a cyberpunk alleyway at night”
  • “An astronaut floating in space, Earth spinning in the background”
  • “Studio-lit product ad with animated particles and reflections”
  • “Low-poly 3D character dancing in sync with background music”

Research Goals Ahead

  • Long-form generation beyond 8s without quality collapse
  • Audio-synced animation with whisper-aligned captions
  • Prompt-to-storyboard-to-video chaining
  • Fine-grained motion editing (speed ramping, masking)

Text-to-video is still an experimental frontier—but one with immense potential.

We're not just waiting for breakthroughs.
We're prototyping them—frame by frame, clip by clip.


Previous
Text to Speech