LLM ReferenceLLM Reference

Granite Speech 4.1 2B NAR

granite-speech-4.1-2b-nar

Open SourceMultimodal

About

IBM Granite Speech 4.1 2B NAR (Non-AutoRegressive) is a high-throughput speech recognition model that generates transcriptions in a single forward pass rather than token-by-token. Architecture: 440M CTC speech encoder (16-layer Conformer) + 160M Q-Former projector + 1B bidirectional LLM editor (LoRA-adapted Granite-4.0-1b-base). Achieves ~1820x real-time factor on a single H100 GPU at batch size 128 and 1.29% WER on LibriSpeech Clean. Optimized for latency-sensitive production deployments. Apache 2.0 license.

Capabilities

VisionMultimodalReasoningFunction CallingTool UseStructured OutputsCode Execution

Rankings

Specifications

Released2026-04-29
Parameters2B
Specializationspeech-recognition

Created by

Creating reliable and adaptable AI solutions

Armonk, New York, United States
Founded 1945
Website