LLM Reference

MOVA 720p

Released
2026-01-29
Last refreshed
2026-06-04
Status
Researched today
MultimodalVisionOpen SourceMultimodal

MOVA 720p is worth evaluating for vision when its provider route and context window match the workload.

Use it for

  • Teams evaluating vision
  • Buyers comparing 1 tracked provider route

Do not use it for

  • Strict JSON or tool-calling flows
Specifications
Family
MOVA
Released
2026-01-29
Parameters
32B total / 18B active
Architecture
mixture-of-experts-dual-tower
Specialization
video-audio-generation
License
Apache 2.0
Training
pretrained
Created by

OpenMOSS audio and video foundation-model research.

Shanghai, China
Website
Pricing
Output / 1M
-
Input / 1M
-

Cheapest of 1 route · Hugging Face Inference Endpoints

About

MOVA 720p is the higher-resolution open-weight MOVA checkpoint for synchronized video-audio generation. MOSI Intelligence and the OpenMOSS Team describe MOVA as a 32B-parameter mixture-of-experts model with 18B active parameters during inference, designed for native image-to-video-audio and text-to-video-audio generation with synchronized audio, lip sync, and sound effects.

MOVA 720p is a model in the MOVA family. The structured metadata tracks multimodal input and audio. This page tracks provider routes through Hugging Face Inference Endpoints. No headline benchmark score is tracked for MOVA 720p yet.

Top use-case fit

Vision

Included by capability and metadata signals in the decision map.

Provider price ladder

Compare API pricing across 1 providers for input and output tokens, batch, and cached reads when available.

ProviderInput / 1MOutput / 1MRoute
Hugging Face Inference Endpoints--
Partial

Capabilities

VisionMultimodalAudio

Benchmark peer barsfor Vision

No task-mapped benchmark peers are available for this model yet.

Migration checks

No linked migration route is available for this model yet.

Rankings & picks(7)