MOSS-Audio 4B Instruct

Name: MOSS-Audio 4B Instruct
Author: MOSI AI

Released

2026-04-13

Last refreshed

2026-06-29

Status

Researched 44d ago

Open sourceCommercial use: permittedMultimodalVisionMultimodal

MOSS-Audio 4B Instruct is worth evaluating for vision when its provider route and context window match the workload.

Use it for

Teams evaluating vision
Buyers comparing 1 tracked provider route

Do not use it for

Strict JSON or tool-calling flows

Specifications

Family: MOSS-Audio
Released: 2026-04-13
Parameters: 4.6B
Architecture: Audio / Speech
Specialization: audio-understanding
Openness: Open source
License: Apache 2.0OSI-approvedCommercial use: permitted
Weights: Available
Code: Unknown
Training: Pretrained

Created by

MOSI AI

OpenMOSS speech, audio, and video foundation-model research.

Shanghai, China

Website

Pricing

Output / 1M

Input / 1M

Cheapest of 1 route · Hugging Face Inference Endpoints

Providers(1)

Hugging Face Inference Endpoints

View 1 provider route

Links

Website HuggingFace

About

MOSS-Audio 4B Instruct is the instruction-following 4.6B variant of MOSI AI and OpenMOSS Team's open-weight audio understanding model. It combines a MOSS-Audio encoder with a Qwen3-4B language backbone for speech, environmental sound, music, captioning, time-aware question answering, timestamped ASR, and audio-grounded reasoning.

MOSS-Audio 4B Instruct is an open-source model in the MOSS-Audio family. The structured metadata tracks multimodal input and audio. This page tracks provider routes through Hugging Face Inference Endpoints. No headline benchmark score is tracked for MOSS-Audio 4B Instruct yet.