7 models across 3 families · Latest: MOSS-TTS-v1.5 (2026-05)
OpenMOSS speech, audio, and video foundation-model research.
MOSI AI's portfolio covers 7 active models across 3 non-obsolete families, with task labels spanning vision. Open a model detail page to compare provider routes and sourced benchmarks.
Portfolio context: 1 decision-task tag, 7 active tracked models, latest research stamp 2026-06-04.
Use it for
- Teams evaluating vision across this lab's releases
- Readers comparing families before locking a flagship SKU
- 7 tracked SKUs for migration and pricing follow-ups
Do not use it for
- Choosing a hosting provider without opening a model page for price ladders
Active models
7
Non-deprecated SKUs linked to this researcher
Active families
3
Non-obsolete families in coverage
Open catalog
7 open
7 OSI source / 0 open weights (0 text-match)
Decision task tags
1
Mapped to the site-wide task taxonomy
Latest dated release
2026-05-26
MOSS-TTS-v1.5
Freshness
2026-06-04
Researched 6d ago
Information
Release cadence
Showing 5 recent dated ships (full timeline below). Latest spotlight: MOSS-TTS-v1.5 (2026-05-26).
Where this lab wins
- Vision: 6 tracked models with multimodal benchmark coverage.
Flagship quality / price signal
Anchor SKU: MOSS-Audio 4B Instruct (best sourced coding Q/$ in this portfolio).
Quality / dollar unavailable for this anchor — missing benchmark coverage and/or output token price on the cheapest ladder route (open the model detail after pricing lands).
MOSI AI is a Chinese AI research organization. OpenMOSS speech, audio, and video foundation-model research. MOSI AI ships 3 model families totaling 7 models, with the most recent release MOSS-TTS-v1.5 in 2026-05. Notable families include MOSS-TTS, MOSS-Audio, and MOVA. Use it as a stable reference for lab background, release coverage, and follow-up model pages as they are added. Researchers and evaluators can. View official API endpoints, benchmark performance, and coding/agent fit for every MOSI AI model.
About
MOSI AI is the organization behind the OpenMOSS Team's open-weight speech, audio, and video foundation models, including MOSS-TTS for text-to-speech, MOSS-Audio for real-world audio understanding, and MOVA for synchronized video-audio generation. Its OpenMOSS presence publishes research code, model cards, and weights through GitHub and Hugging Face, and should be tracked separately from Kyutai's Moshi voice model family.
Featured models
| Model | Released | Context | Input price ($/1M) | Output price ($/1M) | License | Openness |
|---|---|---|---|---|---|---|
| MOSS-TTS-v1.5 | 2026-05-26 | - | - | - | Apache 2.0 | Open source |
| MOSS-Audio 4B Instruct | 2026-04-13 | - | - | - | Apache 2.0 | Open source |
| MOSS-Audio 4B Thinking | 2026-04-13 | - | - | - | Apache 2.0 | Open source |
Model families
Recent releases
- MOSS-TTS-v1.5- 2026-05-26
- MOSS-Audio 4B Instruct- 2026-04-13
- MOSS-Audio 4B Thinking- 2026-04-13
- MOSS-Audio 8B Instruct- 2026-04-13
- MOSS-Audio 8B Thinking- 2026-04-13
FAQ
What models has MOSI AI released?
MOSI AI ships 7 models across 3 families: MOSS-TTS, MOSS-Audio, and MOVA.
Is MOSI AI's technology open source?
All tracked models are released under Apache 2.0.
Where is MOSI AI headquartered?
MOSI AI is headquartered in Shanghai, China.
What is MOSI AI known for?
OpenMOSS speech, audio, and video foundation-model research. Its most prominent tracked family is MOSS-TTS.
How can I access MOSI AI's models?
MOSI AI's models are available via Hugging Face Inference Endpoints.
Explore related pages
Last reviewed: 2026-06-04. Data sourced from public lab announcements and provider documentation.