transformer architecture

Definition

The Transformer architecture consists of encoder and/or decoder stacks with multi-head self-attention, feed-forward layers, and positional encodings for sequence modeling. It enables parallelization and captures context, with decoder-only Transformers powering autoregressive generation in models like GPT and Llama.

Models Using transformer architecture(12)

Granite 4.1 3B Base2026-04 Nemotron 3 Nano Omni2026-04 Nemotron 3 8B2026-03 Granite 4.02025-11 Granite 4.0 Micro2025-10 LingoWhale 8B2024-09 Cerebras LLaVA 7B2024-08 Llama 3.1 405B Instruct2024-07 Llama 3.1 8B Instruct2024-07 Llama 3.1 405B2024-07 Llama 3.1 70B2024-07 StarChat2 15B2024-07