LLM Reference
Concepts & capability filters

transformer architecture

See matching models with benchmark scores and pricing.

Definition

The Transformer architecture consists of encoder and/or decoder stacks with multi-head self-attention, feed-forward layers, and positional encodings for sequence modeling. It enables parallelization and captures context, with decoder-only Transformers powering autoregressive generation in models like GPT and Llama.

Models Mentioning transformer architecture(12)