Experimental Projects
Text to Video
Our Text to Video project explores the frontier of prompt-driven motion generation—from abstract animations to coherent cinematic sequences, with frame-level and motion-level control.
We're actively experimenting with diffusion-based video models, contributing training data, testing inference chains, and building tooling for temporally-consistent generation workflows.
Core Areas of Focus
1. Model Research & Evaluation
We evaluate and contribute across major open-source and research models:
- ModelScope T2V: Early-stage text2video diffusion
- Zeroscope: SD-based 576x320 and 1024x576 resolution video models
- Pika & CogVideo: Prompt-to-video with multilingual support
- AnimateDiff: Motion module layered on top of image generation
- VideoCrafter: Transformer-aware video diffusion
We run comparative tests on:
- Motion smoothness
- Prompt-object consistency
- Loopability
- Speed vs. quality tradeoffs
2. Multi-Stage Generation Pipelines
To improve quality and control, we chain together:
- Text → Keyframes (with Stable Diffusion or SDXL)
- Keyframes → Interpolation (using RIFE or FILM)
- Motion-aware resynthesis (via AnimateDiff or ControlNet)
- Optional: audio-to-sync for speech-matched video
We deploy this workflow using TensorOne Clusters to parallelize each stage.
3. LoRA and Motion Module Training
We've contributed custom LoRA adapters for motion styles like:
cyberpunk-tracking
: stylized tracking camera motion3d-turntable-spin
: full 360° slow pananime-fight-loop
: fast-cut, jittery action sequences
And trained motion-aware AnimateDiff modules using:
- Cinemagraph datasets
- TikTok-style clips with embedded captions
- Storyboard-to-animation transitions
Tooling and Infra
We maintain internal tools for:
- Prompt-to-script breakdown
- Batch video rendering across cluster queues
- Scene interpolation validation
- Video embedding indexing (CLIP + motion embeddings)
Sample job command:
tensoronecli create clusters --gpuType "A100" --imageName "text2video-train" --args "bash run_pipeline.sh"
Experimental Outputs
Some of our early successful generations include:
- “A drone flying through a cyberpunk alleyway at night”
- “An astronaut floating in space, Earth spinning in the background”
- “Studio-lit product ad with animated particles and reflections”
- “Low-poly 3D character dancing in sync with background music”
Research Goals Ahead
- Long-form generation beyond 8s without quality collapse
- Audio-synced animation with whisper-aligned captions
- Prompt-to-storyboard-to-video chaining
- Fine-grained motion editing (speed ramping, masking)
Text-to-video is still an experimental frontier—but one with immense potential.
We're not just waiting for breakthroughs.
We're prototyping them—frame by frame, clip by clip.