MOSS-Audio 4B Thinking

Name: MOSS-Audio 4B Thinking
Author: MOSI AI

Released

2026-04-13

Last refreshed

2026-06-29

Status

Researched 44d ago

Open sourceCommercial use: permittedMultimodalVisionMultimodal

MOSS-Audio 4B Thinking is worth evaluating for vision when its provider route and context window match the workload.

Use it for

Teams evaluating vision
Buyers comparing 1 tracked provider route

Do not use it for

Strict JSON or tool-calling flows

Specifications

Family: MOSS-Audio
Released: 2026-04-13
Parameters: 4.6B
Architecture: Audio / Speech
Specialization: audio-understanding
Openness: Open source
License: Apache 2.0OSI-approvedCommercial use: permitted
Weights: Available
Code: Unknown
Training: Pretrained

Created by

MOSI AI

OpenMOSS speech, audio, and video foundation-model research.

Shanghai, China

Website

Pricing

Output / 1M

Input / 1M

Cheapest of 1 route · Hugging Face Inference Endpoints

Providers(1)

Hugging Face Inference Endpoints

View 1 provider route

Links

Website HuggingFace

About

MOSS-Audio 4B Thinking is the reasoning-tuned 4.6B variant of MOSI AI and OpenMOSS Team's open-weight audio understanding model. It uses the MOSS-Audio encoder and Qwen3-4B backbone, adding chain-of-thought-oriented post-training for stronger complex audio reasoning while retaining speech, sound, music, timestamp, captioning, and QA coverage.

MOSS-Audio 4B Thinking is an open-source model in the MOSS-Audio family. The structured metadata tracks multimodal input, audio, and reasoning. This page tracks provider routes through Hugging Face Inference Endpoints. No headline benchmark score is tracked for MOSS-Audio 4B Thinking yet.