MOSS-Audio 4B Instruct
MOSS-Audio 4B Instruct is worth evaluating for vision when its provider route and context window match the workload.
Use it for
- Teams evaluating vision
- Buyers comparing 1 tracked provider route
Do not use it for
- Strict JSON or tool-calling flows
- Family
- MOSS-Audio
- Released
- 2026-04-13
- Parameters
- 4.6B
- Architecture
- audio-language-transformer
- Specialization
- audio-understanding
- License
- Apache 2.0
- Training
- pretrained
Cheapest of 1 route · Hugging Face Inference Endpoints
About
MOSS-Audio 4B Instruct is the instruction-following 4.6B variant of MOSI Intelligence and OpenMOSS Team's open-weight audio understanding model. It combines a MOSS-Audio encoder with a Qwen3-4B language backbone for speech, environmental sound, music, captioning, time-aware question answering, timestamped ASR, and audio-grounded reasoning.
MOSS-Audio 4B Instruct is a model in the MOSS-Audio family. The structured metadata tracks multimodal input and audio. This page tracks provider routes through Hugging Face Inference Endpoints. No headline benchmark score is tracked for MOSS-Audio 4B Instruct yet.
Top use-case fit
Vision
Included by capability and metadata signals in the decision map.
Provider price ladder
Compare API pricing across 1 providers for input and output tokens, batch, and cached reads when available.
| Provider | Input / 1M | Output / 1M | Route |
|---|---|---|---|
| Hugging Face Inference Endpoints | - | - | Partial |
Capabilities
Benchmark peer barsfor Vision
No task-mapped benchmark peers are available for this model yet.
Migration checks
No linked migration route is available for this model yet.