LLM Reference

MOVA Models by MOSI Intelligence

MOSI IntelligenceApache 2.0Open SourceMultimodal
2 models2026

About

MOVA is an open-weight video-audio generation family from MOSI Intelligence and the OpenMOSS Team. It targets synchronized image-to-video-audio and text-to-video-audio generation with native audio, lip sync, sound effects, and an asymmetric dual-tower mixture-of-experts architecture.

Current Variants

Use-when guidance is derived from seed capabilities, context, release, and replacement fields.

2 in view
MOVA 360pCurrent

Use when the workload needs video audio generation, multimodal inputs, and audio.

2026-01video audio generationmultimodal inputsaudio
MOVA 720pCurrent

Use when the workload needs video audio generation, multimodal inputs, and audio.

2026-01video audio generationmultimodal inputsaudio

Release Timeline

1 release group
2026-01
2 current
MOVA 360p
video audio generationmultimodal inputsaudio
Current
MOVA 720p
video audio generationmultimodal inputsaudio
Current

Specifications(2 models)

MOVA model specifications comparison
ModelReleasedParametersVisionMultimodal
MOVA 360p2026-0132B total / 18B activeYesYes
MOVA 720p2026-0132B total / 18B activeYesYes

Available From(1 provider)

Frequently Asked Questions

What is MOVA used for?
MOVA is used for multimodal, video audio generation, and vision and multimodal work. The family description and listed model capabilities point to those workloads as the best fit.
How does MOVA compare to MOSS-Audio?
MOVA by MOSI Intelligence is strongest where you need multimodal, while MOSS-Audio by MOSI Intelligence is the closest related family to check for multimodal. MOVA has 2 listed variants, so compare the specs and pricing tables before choosing a production model.
Which MOVA model should I use?
If price is the main constraint, use the pricing table first because MOVA does not have complete provider pricing in the local data. For the most capable/latest local choice, evaluate MOVA 360p with multimodal inputs.

Models(2)