MOSI AI

7 models across 3 families · Latest: MOSS-TTS-v1.5 (2026-05)

Researched 25d ago

OpenMOSS speech, audio, and video foundation-model research.

VisionChinaOpen Source

MOSI AI's portfolio covers 7 active models across 3 current families, spanning vision. Open a model detail page to compare provider routes and sourced benchmarks.
Covers 1 workload area across 7 active tracked models; last verified 2026-06-29.

Use it for

Teams evaluating vision across this lab's releases
Comparing model families before committing to a flagship
Migration and pricing follow-ups across 7 tracked models

Do not use it for

Choosing a hosting provider without opening a model page for price ladders

Active models

Current models from this lab, excluding deprecated ones

Active families

Current model families from this lab

Open catalog

7 open

7 open source / 0 open weights

Lowest output price

Not tracked

No provider output pricing linked yet

Latest dated release

2026-05-26

MOSS-TTS-v1.5

Freshness

2026-06-29

Researched 25d ago

fresh

Information

Shanghai, China

Links

Website GitHub HuggingFace

Release cadence

Showing 5 recent dated releases (full timeline below). Latest: MOSS-TTS-v1.5 (2026-05-26).

Where this lab wins

Vision: 6 tracked models with multimodal benchmark coverage.

Flagship quality / price signal

Flagship: MOSS-Audio 4B Instruct (best sourced coding quality-per-dollar in this portfolio).

Quality-per-dollar unavailable for this flagship — benchmark coverage or output token pricing is still missing.

MOSI AI is a Chinese AI research organization. OpenMOSS speech, audio, and video foundation-model research. MOSI AI ships 3 model families totaling 7 models, with the most recent release MOSS-TTS-v1.5 in 2026-05. Notable families include MOSS-TTS, MOSS-Audio, and MOVA. Use it as a stable reference for lab background, release coverage, and follow-up model pages as they are added. Researchers and evaluators can. View official API endpoints, benchmark performance, and coding/agent fit for every MOSI AI model.

About

MOSI AI is the organization behind the OpenMOSS Team's open-weight speech, audio, and video foundation models, including MOSS-TTS for text-to-speech, MOSS-Audio for real-world audio understanding, and MOVA for synchronized video-audio generation. Its OpenMOSS presence publishes research code, model cards, and weights through GitHub and Hugging Face, and should be tracked separately from Kyutai's Moshi voice model family.

Featured models

Model	Released	Context	Input price ($/1M)	Output price ($/1M)	License	Openness
MOSS-TTS-v1.5	2026-05-26	-	-	-	Apache 2.0	Open source
MOSS-Audio 4B Instruct	2026-04-13	-	-	-	Apache 2.0	Open source
MOSS-Audio 4B Thinking	2026-04-13	-	-	-	Apache 2.0	Open source

Model families

MOSS-TTS

MOSS-Audio

MOVA

Recent releases

FAQ

What models has MOSI AI released?

MOSI AI ships 7 models across 3 families: MOSS-TTS, MOSS-Audio, and MOVA.

Is MOSI AI's technology open source?

All tracked models are released under Apache 2.0.

Where is MOSI AI headquartered?

MOSI AI is headquartered in Shanghai, China.

What is MOSI AI known for?

OpenMOSS speech, audio, and video foundation-model research. Its most prominent tracked family is MOSS-TTS.

How can I access MOSI AI's models?

MOSI AI's models are available via Hugging Face Inference Endpoints.

Explore related pages

MOSS-TTS model family MOSS-Audio model family MOVA model family MOSS-TTS-v1.5 model spec MOSS-Audio 4B Instruct model spec MOSS-Audio 4B Thinking model spec 01.AI Tsinghua Knowledge Engineering Group (THUDM)Baichuan Intelligent Technology Baidu AI

Last reviewed: 2026-06-29. Data sourced from public lab announcements and provider documentation.