MOSS-Audio Models by MOSI Intelligence
About
MOSS-Audio is an open-weight audio-language model family for unified audio understanding across speech, environmental sound, music, captioning, time-aware question answering, timestamped transcription, and audio-grounded reasoning. The April 2026 release includes 4B and 8B Instruct and Thinking variants built with a dedicated audio encoder, modality adapter, and Qwen3 language-model backbones.
Current Variants
Use-when guidance is derived from seed capabilities, context, release, and replacement fields.
Use when the workload needs audio understanding, 4.6B parameters, and multimodal inputs.
Use when the workload needs audio understanding, 4.6B parameters, and reasoning.
Use when the workload needs audio understanding, 8.6B parameters, and multimodal inputs.
Use when the workload needs audio understanding, 8.6B parameters, and reasoning.
| Model | Use when | Released | Signals | Status |
|---|---|---|---|---|
| MOSS-Audio 4B Instruct | Use when the workload needs audio understanding, 4.6B parameters, and multimodal inputs. | 2026-04 | audio understanding4.6B parametersmultimodal inputs | Current |
| MOSS-Audio 4B Thinking | Use when the workload needs audio understanding, 4.6B parameters, and reasoning. | 2026-04 | audio understanding4.6B parametersreasoning | Current |
| MOSS-Audio 8B Instruct | Use when the workload needs audio understanding, 8.6B parameters, and multimodal inputs. | 2026-04 | audio understanding8.6B parametersmultimodal inputs | Current |
| MOSS-Audio 8B Thinking | Use when the workload needs audio understanding, 8.6B parameters, and reasoning. | 2026-04 | audio understanding8.6B parametersreasoning | Current |
Release Timeline
1 release groupSpecifications(4 models)
| Model | Released | Parameters | Multimodal | Reasoning |
|---|---|---|---|---|
| MOSS-Audio 4B Instruct | 2026-04 | 4.6B | Yes | No |
| MOSS-Audio 4B Thinking | 2026-04 | 4.6B | Yes | Yes |
| MOSS-Audio 8B Instruct | 2026-04 | 8.6B | Yes | No |
| MOSS-Audio 8B Thinking | 2026-04 | 8.6B | Yes | Yes |
Available From(1 provider)
Frequently Asked Questions
- What is MOSS-Audio used for?
- MOSS-Audio is used for multimodal, audio understanding, and vision and multimodal work. The family description and listed model capabilities point to those workloads as the best fit.
- How does MOSS-Audio compare to MOVA?
- MOSS-Audio by MOSI Intelligence is strongest where you need multimodal, while MOVA by MOSI Intelligence is the closest related family to check for multimodal. MOSS-Audio has 4 listed variants, so compare the specs and pricing tables before choosing a production model.
- Which MOSS-Audio model should I use?
- If price is the main constraint, use the pricing table first because MOSS-Audio does not have complete provider pricing in the local data. For the most capable/latest local choice, evaluate MOSS-Audio 4B Thinking with reasoning and multimodal inputs.




