MOVA Models by MOSI Intelligence
2 models2026
About
MOVA is an open-weight video-audio generation family from MOSI Intelligence and the OpenMOSS Team. It targets synchronized image-to-video-audio and text-to-video-audio generation with native audio, lip sync, sound effects, and an asymmetric dual-tower mixture-of-experts architecture.
Current Variants
Use-when guidance is derived from seed capabilities, context, release, and replacement fields.
2 in view
MOVA 360pCurrent
Use when the workload needs video audio generation, multimodal inputs, and audio.
2026-01video audio generationmultimodal inputsaudio
MOVA 720pCurrent
Use when the workload needs video audio generation, multimodal inputs, and audio.
2026-01video audio generationmultimodal inputsaudio
| Model | Use when | Released | Signals | Status |
|---|---|---|---|---|
| MOVA 360p | Use when the workload needs video audio generation, multimodal inputs, and audio. | 2026-01 | video audio generationmultimodal inputsaudio | Current |
| MOVA 720p | Use when the workload needs video audio generation, multimodal inputs, and audio. | 2026-01 | video audio generationmultimodal inputsaudio | Current |
Release Timeline
1 release groupSpecifications(2 models)
Available From(1 provider)
Frequently Asked Questions
- What is MOVA used for?
- MOVA is used for multimodal, video audio generation, and vision and multimodal work. The family description and listed model capabilities point to those workloads as the best fit.
- How does MOVA compare to MOSS-Audio?
- MOVA by MOSI Intelligence is strongest where you need multimodal, while MOSS-Audio by MOSI Intelligence is the closest related family to check for multimodal. MOVA has 2 listed variants, so compare the specs and pricing tables before choosing a production model.
- Which MOVA model should I use?
- If price is the main constraint, use the pricing table first because MOVA does not have complete provider pricing in the local data. For the most capable/latest local choice, evaluate MOVA 360p with multimodal inputs.




