LLM ReferenceLLM Reference

Swallow Models by Tokyo Institute of Technology

8 models2024–2025Up to 16K ctx

About

Swallow is a family of 8 AI models by Tokyo Institute of Technology, released between 2024 and 2025.

Current Variants

Use-when guidance is derived from seed capabilities, context, release, and replacement fields.

8 in view

Use when the workload needs 16K context and 30B parameters.

2025-0216K context30B parameters

Use when the workload needs 4K context and 70B parameters.

2025-014K context70B parameters

Use when the workload needs 4K context and 8B parameters.

2025-014K context8B parameters

Use when the workload needs 8K context and 13B parameters.

2024-128K context13B parameters

Use when the workload needs 8K context and 13B parameters.

2024-128K context13B parameters

Use when the workload needs 4K context and 7B parameters.

2024-094K context7B parameters
Swallow 7BCurrent

Use when the workload needs 4K context and 7B parameters.

2024-094K context7B parameters

Use when the workload needs 4K context and 70B parameters.

2024-064K context70B parameters

Release Timeline

5 release groups
2025-02
1 current
Swallow 30B
16K context30B parameters
Current
2025-01
2 current
Llama 3.1 Swallow 70B Instruct
4K context70B parameters
Current
Llama 3.1 Swallow 8B Instruct
4K context8B parameters
Current
2024-12
2 current
Swallow 13B
8K context13B parameters
Current
Swallow 13B Instruct
8K context13B parameters
Current
2024-09
2 current
Swallow 7B
4K context7B parameters
Current
Swallow 7B Instruct
4K context7B parameters
Current
2024-06
1 current
Llama 3 Swallow 70B Instruct
4K context70B parameters
Current

Specifications(8 models)

Swallow model specifications comparison
ModelReleasedContextParameters
Swallow 30B2025-0216K30B
Llama 3.1 Swallow 70B Instruct2025-014K70B
Llama 3.1 Swallow 8B Instruct2025-014K8B
Swallow 13B Instruct2024-128K13B
Swallow 13B2024-128K13B
Swallow 7B Instruct2024-094K7B
Swallow 7B2024-094K7B
Llama 3 Swallow 70B Instruct2024-064K70B

Available From(1 provider)

Frequently Asked Questions

What is Swallow used for?
Swallow is used for coding and math-heavy prompts. The family description and listed model capabilities point to those workloads as the best fit.
How does Swallow compare to Claude 3?
Swallow by Tokyo Institute of Technology is strongest where you need coding, while Claude 3 by Anthropic is the closest related family to check for vision and multimodal work. Swallow has 8 listed variants and reaches up to 16K context, while Claude 3 reaches up to 200K context, so compare the specs and pricing tables before choosing a production model.
Which Swallow model should I use?
If price is the main constraint, use the pricing table first because Swallow does not have complete provider pricing in the local data. For the most capable/latest local choice, evaluate Swallow 30B with 16K context.

Models(8)