What is Swallow used for?

Swallow is used for coding and math-heavy prompts. The family description and listed model capabilities point to those workloads as the best fit.

How does Swallow compare to Claude 3?

Swallow by Tokyo Institute of Technology is strongest where you need coding, while Claude 3 by Anthropic is the closest related family to check for vision and multimodal work. Swallow has 8 listed variants and reaches up to 16k context, while Claude 3 reaches up to 200k context, so compare the specs and pricing tables before choosing a production model.

Which Swallow model should I use?

If price is the main constraint, use the pricing table first because Swallow does not have complete provider pricing in the local data. For the most capable/latest local choice, evaluate Swallow 30B with 16k context.

Swallow Models by Tokyo Institute of Technology

Tokyo Institute of TechnologyLlama 2 CommunityOpen weights

8 models2024–2025Up to 16k ctx

Details

ResearcherTokyo Institute of Technology

LicenseLlama 2 Community

Commercial useCommercial use: conditional

Models8

Released2024–2025

Max context16k

About

Swallow is a family of 8 AI models by Tokyo Institute of Technology, released between 2024 and 2025.

Current Variants

Use-when guidance is based on each model's tracked capabilities, context window, release date, and replacement status.

8 in view

Swallow 30BCurrent

Use when the workload needs 16k context and 30B parameters.

2025-0216k context30B parameters

Llama 3.1 Swallow 70B InstructCurrent

Use when the workload needs 4k context and 70B parameters.

2025-014k context70B parameters

Llama 3.1 Swallow 8B InstructCurrent

Use when the workload needs 4k context and 8B parameters.

2025-014k context8B parameters

Swallow 13B InstructCurrent

Use when the workload needs 8k context and 13B parameters.

2024-128k context13B parameters

Swallow 13BCurrent

Use when the workload needs 8k context and 13B parameters.

2024-128k context13B parameters

Swallow 7B InstructCurrent

Use when the workload needs 4k context and 7B parameters.

2024-094k context7B parameters

Swallow 7BCurrent

Use when the workload needs 4k context and 7B parameters.

2024-094k context7B parameters

Llama 3 Swallow 70B InstructCurrent

Use when the workload needs 4k context and 70B parameters.

2024-064k context70B parameters

Current Swallow variants with use-when guidance and lifecycle status
Model	Use when	Released	Signals	Status
Swallow 30B	Use when the workload needs 16k context and 30B parameters.	2025-02	16k context30B parameters	Current
Llama 3.1 Swallow 70B Instruct	Use when the workload needs 4k context and 70B parameters.	2025-01	4k context70B parameters	Current
Llama 3.1 Swallow 8B Instruct	Use when the workload needs 4k context and 8B parameters.	2025-01	4k context8B parameters	Current
Swallow 13B Instruct	Use when the workload needs 8k context and 13B parameters.	2024-12	8k context13B parameters	Current
Swallow 13B	Use when the workload needs 8k context and 13B parameters.	2024-12	8k context13B parameters	Current
Swallow 7B Instruct	Use when the workload needs 4k context and 7B parameters.	2024-09	4k context7B parameters	Current
Swallow 7B	Use when the workload needs 4k context and 7B parameters.	2024-09	4k context7B parameters	Current
Llama 3 Swallow 70B Instruct	Use when the workload needs 4k context and 70B parameters.	2024-06	4k context70B parameters	Current

Release Timeline

5 release groups

2025-02

1 current

Swallow 30B

16k context30B parameters

Current

2025-01

2 current

Llama 3.1 Swallow 70B Instruct

4k context70B parameters

Current

Llama 3.1 Swallow 8B Instruct

4k context8B parameters

Current

2024-12

2 current

Swallow 13B

8k context13B parameters

Current

Swallow 13B Instruct

8k context13B parameters

Current

2024-09

2 current

Swallow 7B

4k context7B parameters

Current

Swallow 7B Instruct

4k context7B parameters

Current

2024-06

1 current

Llama 3 Swallow 70B Instruct

4k context70B parameters

Current

Specifications(8 models)

Swallow model specifications comparison
Model	Released	Context	Parameters
Swallow 30B	2025-02	16k	30B
Llama 3.1 Swallow 70B Instruct	2025-01	4k	70B
Llama 3.1 Swallow 8B Instruct	2025-01	4k	8B
Swallow 13B Instruct	2024-12	8k	13B
Swallow 13B	2024-12	8k	13B
Swallow 7B Instruct	2024-09	4k	7B
Swallow 7B	2024-09	4k	7B
Llama 3 Swallow 70B Instruct	2024-06	4k	70B

Available From(1 provider)

NVIDIA NIM

Popular comparisons in this family

Frequently Asked Questions

What is Swallow used for?: Swallow is used for coding and math-heavy prompts. The family description and listed model capabilities point to those workloads as the best fit.
How does Swallow compare to Claude 3?: Swallow by Tokyo Institute of Technology is strongest where you need coding, while Claude 3 by Anthropic is the closest related family to check for vision and multimodal work. Swallow has 8 listed variants and reaches up to 16k context, while Claude 3 reaches up to 200k context, so compare the specs and pricing tables before choosing a production model.
Which Swallow model should I use?: If price is the main constraint, use the pricing table first because Swallow does not have complete provider pricing in the local data. For the most capable/latest local choice, evaluate Swallow 30B with 16k context.

Models(8)

Swallow 30B

2025-0216k30B

Open Weights

Llama 3.1 Swallow 70B Instruct

2025-014k70B1 provider

Open Weights

Llama 3.1 Swallow 8B Instruct

2025-014k8B1 provider

Swallow 13B Instruct

Swallow 13B

Swallow 7B Instruct

Swallow 7B

Llama 3 Swallow 70B Instruct

2024-064k70B1 provider

Open Weights