MOVA 720p

Name: MOVA 720p
Author: MOSI AI

Released

2026-01-29

Last refreshed

2026-06-29

Status

Researched 44d ago

Open sourceCommercial use: permittedMultimodalVisionMultimodal

MOVA 720p is worth evaluating for vision when its provider route and context window match the workload.

Use it for

Teams evaluating vision
Buyers comparing 1 tracked provider route

Do not use it for

Strict JSON or tool-calling flows

Specifications

Family: MOVA
Released: 2026-01-29
Parameters: 32B total / 18B active
Architecture: Mixture of Experts
Specialization: video-generation
Openness: Open source
License: Apache 2.0OSI-approvedCommercial use: permitted
Weights: Available
Code: Unknown
Training: Pretrained

Created by

MOSI AI

OpenMOSS speech, audio, and video foundation-model research.

Shanghai, China

Website

Pricing

Output / 1M

Input / 1M

Cheapest of 1 route · Hugging Face Inference Endpoints

Providers(1)

Hugging Face Inference Endpoints

View 1 provider route

Links

Website HuggingFace

About

MOVA 720p is the higher-resolution open-weight MOVA checkpoint for synchronized video-audio generation. MOSI AI and the OpenMOSS Team describe MOVA as a 32B-parameter mixture-of-experts model with 18B active parameters during inference, designed for native image-to-video-audio and text-to-video-audio generation with synchronized audio, lip sync, and sound effects.

MOVA 720p is an open-source model in the MOVA family. The structured metadata tracks multimodal input and audio. This page tracks provider routes through Hugging Face Inference Endpoints. No headline benchmark score is tracked for MOVA 720p yet.