Capability filtercapabilitybeginner

Context window

Also known as: context length, context size, token window

See matching models with benchmark scores and pricing.

1,333

matching active models

tracked providers

745

models with routes

model.context

Definition

The context window is the maximum number of tokens a large language model can consider at once for input and output during inference, limiting the amount of information it can process in a single pass. Larger windows enable handling longer conversations or documents but increase computational demands.

Models With Context window

Showing the first 80 matches, sorted by decision relevance, with tracked capability and provider-route evidence.

1,333 matches

ModelReleaseContextCapabilitiesProvider route

RWKV-7 Goose 0.1B

RWKV-7 Goose 0.1B (approximately 190M parameters) is the smallest model in the RWKV-7 Goose series. Suitable for ultra-low-resource deployment with constant-memory inference. Uses RWKV-7 architecture with the Generalized Delta Rule. Trained on the World v2.8 corpus. Apache 2.0 licensed.

2025-03-18

Researched 41d ago

Infinite

No fixed token cap

Infinite context

No tracked provider route

RWKV-6 Finch 1.6B

RWKV-6 Finch 1.6B is the smallest model in the RWKV-6 Finch series, ideal for lightweight deployments requiring constant-memory inference. Apache 2.0 licensed.

2024-04-09

Researched 41d ago

Infinite

No fixed token cap

Infinite context

No tracked provider route

RWKV-7 Goose 0.4B

RWKV-7 Goose 0.4B (approximately 450M parameters) is a lightweight model from the RWKV-7 Goose series. Designed for edge deployment and resource-constrained environments where constant-memory O(1) inference is critical. Uses RWKV-7 architecture with the Generalized Delta Rule. Trained on the World v2.9 corpus. Apache 2.0 licensed.

2025-03-18

Researched 41d ago

Infinite

No fixed token cap

Infinite context

No tracked provider route

RWKV-6 Finch 3B

RWKV-6 Finch 3B is a mid-range model in the RWKV-6 Finch series, offering a balance between capability and deployment efficiency. Apache 2.0 licensed. Constant-memory inference with no KV cache.

2024-04-09

Researched 41d ago

Infinite

No fixed token cap

Infinite context

No tracked provider route

RWKV-7 Goose 1.5B

RWKV-7 Goose 1.5B is a mid-range model from the RWKV-7 Goose World3 series. Uses the seventh-generation RWKV architecture with Generalized Delta Rule for expressive dynamic state evolution. Trained on 3.1 trillion tokens from the multilingual World v3 corpus. Constant-memory inference with no KV cache. Apache 2.0 licensed.

2025-03-18

Researched 41d ago

Infinite

No fixed token cap

Infinite context

No tracked provider route

RWKV-6 Finch 7B

RWKV-6 Finch 7B is a flagship mid-size model from the RWKV-6 architecture series. Introduced alongside the Eagle and Finch paper (arXiv 2404.05892, April 2024). The Finch 14B model was subsequently derived by stacking two Finch 7B weights. Uses multi-headed matrix-valued states for improved language comprehension. Constant-memory inference. Apache 2.0 licensed.

2024-04-09

Researched 41d ago

Infinite

No fixed token cap

Infinite context

No tracked provider route

RWKV-7 Goose 2.9B

RWKV-7 Goose 2.9B is the largest released model in the RWKV-7 Goose World3 series. Built on the seventh-generation RWKV architecture with the Generalized Delta Rule and dynamic state evolution, it achieves competitive benchmark performance against transformer models of equivalent scale. Trained on 3.1 trillion tokens from the World v3 multilingual corpus (100+ languages, BF16). As a pure recurrent architecture, it requires constant O(1) memory during inference (no KV cache) and processes sequences in linear O(n) time. Licensed Apache 2.0.

2025-03-18

Researched 41d ago

Infinite

No fixed token cap

Infinite context

No tracked provider route

RWKV-6 Finch 14B

RWKV-6 Finch 14B is the largest model in the RWKV-6 Finch series, created by stacking two 7B Finch models. Released September 3, 2024. Achieves strong performance on MMLU (56.05%), ARC, HellaSwag (57.69%), and Winogrande (74.43%). The RWKV-6 architecture uses matrix-valued states and dynamic data-driven recurrence, improving comprehension and in-context reasoning compared to RWKV-5 (Eagle). Constant-memory O(1) inference with no KV cache. Apache 2.0 licensed.

2024-09-03

Researched 41d ago

Infinite

No fixed token cap

Infinite context

No tracked provider route

LTM-2-mini

LTM-2-mini is Magic's research prototype supporting a 100 million token context window, announced August 29, 2024. Uses a novel sequence-dimension algorithm approximately 1,000× more memory-efficient than transformer attention at this scale — requiring only a fraction of a single H100's HBM versus 638 H100s for Llama 3.1 405B at the same context length. Not publicly released for API access or self-hosting; Magic stated they were separately training a full LTM-2 model. Specialization: coding/software development. Source: https://magic.dev/blog/100m-token-context-windows

2024-08-29

Researched 47d ago

100m

100,000,000 tokens

100m context

No tracked provider route

Llama 4 Scout 17B-16E Instruct

Meta's Llama 4 Scout is a 17-billion parameter mixture-of-experts model with 16 expert routing. Optimized for efficient inference on edge and cloud environments with strong multi-turn conversation capabilities. Available on Cloudflare Workers AI.

2025-04-05

Researched 28d ago

10m

10,000,000 tokens

10m contextVisionMultimodalJSON

AWS Bedrock

$0.170 in / $0.220 out / 1M tokens

12 routes