LLM Reference

MOSS-Audio 8B Instruct

Released
2026-04-13
Last refreshed
2026-06-04
Status
Researched today
MultimodalVisionOpen SourceMultimodal

MOSS-Audio 8B Instruct is worth evaluating for vision when its provider route and context window match the workload.

Use it for

  • Teams evaluating vision
  • Buyers comparing 1 tracked provider route

Do not use it for

  • Strict JSON or tool-calling flows
Specifications
Released
2026-04-13
Parameters
8.6B
Architecture
audio-language-transformer
Specialization
audio-understanding
License
Apache 2.0
Training
pretrained
Created by

OpenMOSS audio and video foundation-model research.

Shanghai, China
Website
Pricing
Output / 1M
-
Input / 1M
-

Cheapest of 1 route · Hugging Face Inference Endpoints

About

MOSS-Audio 8B Instruct is the instruction-following 8.6B variant of MOSI Intelligence and OpenMOSS Team's open-weight audio understanding model. It pairs the MOSS-Audio encoder with a Qwen3-8B language backbone and is positioned for stronger open-source speech, sound, music, audio captioning, ASR, timestamp, and QA workloads.

MOSS-Audio 8B Instruct is a model in the MOSS-Audio family. The structured metadata tracks multimodal input and audio. This page tracks provider routes through Hugging Face Inference Endpoints. No headline benchmark score is tracked for MOSS-Audio 8B Instruct yet.

Top use-case fit

Vision

Included by capability and metadata signals in the decision map.

Provider price ladder

Compare API pricing across 1 providers for input and output tokens, batch, and cached reads when available.

ProviderInput / 1MOutput / 1MRoute
Hugging Face Inference Endpoints--
Partial

Capabilities

MultimodalAudio

Benchmark peer barsfor Vision

No task-mapped benchmark peers are available for this model yet.

Migration checks

No linked migration route is available for this model yet.

Rankings & picks(7)