LLM Reference

MOSS-Audio 4B Thinking

Released
2026-04-13
Last refreshed
2026-06-04
Status
Researched today
MultimodalVisionOpen SourceMultimodal

MOSS-Audio 4B Thinking is worth evaluating for vision when its provider route and context window match the workload.

Use it for

  • Teams evaluating vision
  • Buyers comparing 1 tracked provider route

Do not use it for

  • Strict JSON or tool-calling flows
Specifications
Released
2026-04-13
Parameters
4.6B
Architecture
audio-language-transformer
Specialization
audio-understanding
License
Apache 2.0
Training
pretrained
Created by

OpenMOSS audio and video foundation-model research.

Shanghai, China
Website
Pricing
Output / 1M
-
Input / 1M
-

Cheapest of 1 route · Hugging Face Inference Endpoints

About

MOSS-Audio 4B Thinking is the reasoning-tuned 4.6B variant of MOSI Intelligence and OpenMOSS Team's open-weight audio understanding model. It uses the MOSS-Audio encoder and Qwen3-4B backbone, adding chain-of-thought-oriented post-training for stronger complex audio reasoning while retaining speech, sound, music, timestamp, captioning, and QA coverage.

MOSS-Audio 4B Thinking is a model in the MOSS-Audio family. The structured metadata tracks multimodal input, audio, and reasoning. This page tracks provider routes through Hugging Face Inference Endpoints. No headline benchmark score is tracked for MOSS-Audio 4B Thinking yet.

Top use-case fit

Vision

Included by capability and metadata signals in the decision map.

Provider price ladder

Compare API pricing across 1 providers for input and output tokens, batch, and cached reads when available.

ProviderInput / 1MOutput / 1MRoute
Hugging Face Inference Endpoints--
Partial

Capabilities

MultimodalReasoningAudio

Benchmark peer barsfor Vision

No task-mapped benchmark peers are available for this model yet.

Migration checks

No linked migration route is available for this model yet.

Rankings & picks(8)