Swallow Models by Tokyo Institute of Technology
About
Swallow is a family of 8 AI models by Tokyo Institute of Technology, released between 2024 and 2025.
Current Variants
Use-when guidance is derived from seed capabilities, context, release, and replacement fields.
Use when the workload needs 16K context and 30B parameters.
Use when the workload needs 4K context and 70B parameters.
Use when the workload needs 4K context and 8B parameters.
Use when the workload needs 8K context and 13B parameters.
Use when the workload needs 8K context and 13B parameters.
Use when the workload needs 4K context and 7B parameters.
Use when the workload needs 4K context and 7B parameters.
Use when the workload needs 4K context and 70B parameters.
| Model | Use when | Released | Signals | Status |
|---|---|---|---|---|
| Swallow 30B | Use when the workload needs 16K context and 30B parameters. | 2025-02 | 16K context30B parameters | Current |
| Llama 3.1 Swallow 70B Instruct | Use when the workload needs 4K context and 70B parameters. | 2025-01 | 4K context70B parameters | Current |
| Llama 3.1 Swallow 8B Instruct | Use when the workload needs 4K context and 8B parameters. | 2025-01 | 4K context8B parameters | Current |
| Swallow 13B Instruct | Use when the workload needs 8K context and 13B parameters. | 2024-12 | 8K context13B parameters | Current |
| Swallow 13B | Use when the workload needs 8K context and 13B parameters. | 2024-12 | 8K context13B parameters | Current |
| Swallow 7B Instruct | Use when the workload needs 4K context and 7B parameters. | 2024-09 | 4K context7B parameters | Current |
| Swallow 7B | Use when the workload needs 4K context and 7B parameters. | 2024-09 | 4K context7B parameters | Current |
| Llama 3 Swallow 70B Instruct | Use when the workload needs 4K context and 70B parameters. | 2024-06 | 4K context70B parameters | Current |
Release Timeline
5 release groupsSpecifications(8 models)
| Model | Released | Context | Parameters |
|---|---|---|---|
| Swallow 30B | 2025-02 | 16K | 30B |
| Llama 3.1 Swallow 70B Instruct | 2025-01 | 4K | 70B |
| Llama 3.1 Swallow 8B Instruct | 2025-01 | 4K | 8B |
| Swallow 13B Instruct | 2024-12 | 8K | 13B |
| Swallow 13B | 2024-12 | 8K | 13B |
| Swallow 7B Instruct | 2024-09 | 4K | 7B |
| Swallow 7B | 2024-09 | 4K | 7B |
| Llama 3 Swallow 70B Instruct | 2024-06 | 4K | 70B |
Available From(1 provider)
Frequently Asked Questions
- What is Swallow used for?
- Swallow is used for coding and math-heavy prompts. The family description and listed model capabilities point to those workloads as the best fit.
- How does Swallow compare to Claude 3?
- Swallow by Tokyo Institute of Technology is strongest where you need coding, while Claude 3 by Anthropic is the closest related family to check for vision and multimodal work. Swallow has 8 listed variants and reaches up to 16K context, while Claude 3 reaches up to 200K context, so compare the specs and pricing tables before choosing a production model.
- Which Swallow model should I use?
- If price is the main constraint, use the pricing table first because Swallow does not have complete provider pricing in the local data. For the most capable/latest local choice, evaluate Swallow 30B with 16K context.
Models(8)
Swallow 30B
Llama 3.1 Swallow 70B Instruct
Llama 3.1 Swallow 8B Instruct
Swallow 13B Instruct
Swallow 13B
Swallow 7B Instruct
Swallow 7B
Llama 3 Swallow 70B Instruct
