What is MOVA used for?

MOVA is used for multimodal, video generation, and vision and multimodal work. The family description and listed model capabilities point to those workloads as the best fit.

How does MOVA compare to MOSS-Audio?

MOVA by MOSI AI is strongest where you need multimodal, while MOSS-Audio by MOSI AI is the closest related family to check for multimodal. MOVA has 2 listed variants, so compare the specs and pricing tables before choosing a production model.

Which MOVA model should I use?

If price is the main constraint, use the pricing table first because MOVA does not have complete provider pricing in the local data. For the most capable/latest local choice, evaluate MOVA 360p with multimodal inputs.

MOVA Models by MOSI AI

MOSI AIApache 2.0Open sourceOpen SourceMultimodal

2 models2026

Details

ResearcherMOSI AI

LicenseApache 2.0OSI-approved

Commercial useCommercial use: permitted

Models2

Released2026

Capabilities

VisionAll models

MultimodalAll models

Links

Website HuggingFace

About

MOVA is an open-weight video-audio generation family from MOSI AI and the OpenMOSS Team. It targets synchronized image-to-video-audio and text-to-video-audio generation with native audio, lip sync, sound effects, and an asymmetric dual-tower mixture-of-experts architecture.

Current Variants

Use-when guidance is based on each model's tracked capabilities, context window, release date, and replacement status.

2 in view

MOVA 360pCurrent

Use when the workload needs video generation, multimodal inputs, and audio.

2026-01video generationmultimodal inputsaudio

MOVA 720pCurrent

Use when the workload needs video generation, multimodal inputs, and audio.

2026-01video generationmultimodal inputsaudio

Current MOVA variants with use-when guidance and lifecycle status
Model	Use when	Released	Signals	Status
MOVA 360p	Use when the workload needs video generation, multimodal inputs, and audio.	2026-01	video generationmultimodal inputsaudio	Current
MOVA 720p	Use when the workload needs video generation, multimodal inputs, and audio.	2026-01	video generationmultimodal inputsaudio	Current

Release Timeline

1 release group

2026-01

2 current

MOVA 360p

video generationmultimodal inputsaudio

Current

MOVA 720p

video generationmultimodal inputsaudio

Current

Specifications(2 models)

MOVA model specifications comparison
Model	Released	Parameters	Vision	Multimodal
MOVA 360p	2026-01	32B total / 18B active	Yes	Yes
MOVA 720p	2026-01	32B total / 18B active	Yes	Yes

Available From(1 provider)

Hugging Face Inference Endpoints

Frequently Asked Questions

What is MOVA used for?: MOVA is used for multimodal, video generation, and vision and multimodal work. The family description and listed model capabilities point to those workloads as the best fit.
How does MOVA compare to MOSS-Audio?: MOVA by MOSI AI is strongest where you need multimodal, while MOSS-Audio by MOSI AI is the closest related family to check for multimodal. MOVA has 2 listed variants, so compare the specs and pricing tables before choosing a production model.
Which MOVA model should I use?: If price is the main constraint, use the pricing table first because MOVA does not have complete provider pricing in the local data. For the most capable/latest local choice, evaluate MOVA 360p with multimodal inputs.

Models(2)