LLM Reference

MOSS-Audio Models by MOSI Intelligence

MOSI IntelligenceApache 2.0Open SourceMultimodal
4 models2026

About

MOSS-Audio is an open-weight audio-language model family for unified audio understanding across speech, environmental sound, music, captioning, time-aware question answering, timestamped transcription, and audio-grounded reasoning. The April 2026 release includes 4B and 8B Instruct and Thinking variants built with a dedicated audio encoder, modality adapter, and Qwen3 language-model backbones.

Current Variants

Use-when guidance is derived from seed capabilities, context, release, and replacement fields.

4 in view

Use when the workload needs audio understanding, 4.6B parameters, and multimodal inputs.

2026-04audio understanding4.6B parametersmultimodal inputs

Use when the workload needs audio understanding, 4.6B parameters, and reasoning.

2026-04audio understanding4.6B parametersreasoning

Use when the workload needs audio understanding, 8.6B parameters, and multimodal inputs.

2026-04audio understanding8.6B parametersmultimodal inputs

Use when the workload needs audio understanding, 8.6B parameters, and reasoning.

2026-04audio understanding8.6B parametersreasoning

Release Timeline

1 release group
2026-04
4 current
MOSS-Audio 4B Instruct
audio understanding4.6B parametersmultimodal inputs
Current
MOSS-Audio 4B Thinking
audio understanding4.6B parametersreasoning
Current
MOSS-Audio 8B Instruct
audio understanding8.6B parametersmultimodal inputs
Current
MOSS-Audio 8B Thinking
audio understanding8.6B parametersreasoning
Current

Specifications(4 models)

MOSS-Audio model specifications comparison
ModelReleasedParametersMultimodalReasoning
MOSS-Audio 4B Instruct2026-044.6BYesNo
MOSS-Audio 4B Thinking2026-044.6BYesYes
MOSS-Audio 8B Instruct2026-048.6BYesNo
MOSS-Audio 8B Thinking2026-048.6BYesYes

Available From(1 provider)

Frequently Asked Questions

What is MOSS-Audio used for?
MOSS-Audio is used for multimodal, audio understanding, and vision and multimodal work. The family description and listed model capabilities point to those workloads as the best fit.
How does MOSS-Audio compare to MOVA?
MOSS-Audio by MOSI Intelligence is strongest where you need multimodal, while MOVA by MOSI Intelligence is the closest related family to check for multimodal. MOSS-Audio has 4 listed variants, so compare the specs and pricing tables before choosing a production model.
Which MOSS-Audio model should I use?
If price is the main constraint, use the pricing table first because MOSS-Audio does not have complete provider pricing in the local data. For the most capable/latest local choice, evaluate MOSS-Audio 4B Thinking with reasoning and multimodal inputs.

Models(4)