MOVA 720p
MOVA 720p is worth evaluating for vision when its provider route and context window match the workload.
Use it for
- Teams evaluating vision
- Buyers comparing 1 tracked provider route
Do not use it for
- Strict JSON or tool-calling flows
- Family
- MOVA
- Released
- 2026-01-29
- Parameters
- 32B total / 18B active
- Architecture
- mixture-of-experts-dual-tower
- Specialization
- video-audio-generation
- License
- Apache 2.0
- Training
- pretrained
Cheapest of 1 route · Hugging Face Inference Endpoints
About
MOVA 720p is the higher-resolution open-weight MOVA checkpoint for synchronized video-audio generation. MOSI Intelligence and the OpenMOSS Team describe MOVA as a 32B-parameter mixture-of-experts model with 18B active parameters during inference, designed for native image-to-video-audio and text-to-video-audio generation with synchronized audio, lip sync, and sound effects.
MOVA 720p is a model in the MOVA family. The structured metadata tracks multimodal input and audio. This page tracks provider routes through Hugging Face Inference Endpoints. No headline benchmark score is tracked for MOVA 720p yet.
Top use-case fit
Vision
Included by capability and metadata signals in the decision map.
Provider price ladder
Compare API pricing across 1 providers for input and output tokens, batch, and cached reads when available.
| Provider | Input / 1M | Output / 1M | Route |
|---|---|---|---|
| Hugging Face Inference Endpoints | - | - | Partial |
Capabilities
Benchmark peer barsfor Vision
No task-mapped benchmark peers are available for this model yet.
Migration checks
No linked migration route is available for this model yet.