DeepSeek Models by DeepSeek
Details
Capabilities
About
The DeepSeek LLM family includes open-source large language models designed for exceptional language comprehension and diverse applications 410. These models shine in reasoning, coding, mathematics, and Chinese comprehension, often surpassing similar models in benchmarks 410. The lineup features base and chat models with parameter sizes of 7 billion and 67 billion, respectively 410. They are trained with a massive dataset of 2 trillion tokens in English and Chinese 410, and the architecture, based on the Llama model, enhances inference efficiency through Grouped-Query Attention in the 67B model 1. Available for research and commercial use, additional models like DeepSeek-Coder and DeepSeek-VL cater to code generation and vision-language tasks, respectively 89.
Current Variants
Use-when guidance is derived from seed capabilities, context, release, and replacement fields.
Use when the workload needs 164k context and structured outputs.
Use when the workload needs 164k context and structured outputs.
Use when the workload needs 164k context, structured outputs, and code execution.
Use when the workload needs 4k context, 67B parameters, and structured outputs.
Use when the workload needs 4k context, 67B parameters, and structured outputs.
Use when the workload needs 4k context and 7B parameters.
Use when the workload needs 4k context and 67B parameters.
Use when the workload needs 4k context and 7B parameters.
| Model | Use when | Released | Signals | Status |
|---|---|---|---|---|
| DeepSeek V3.1 Terminus | Use when the workload needs 164k context and structured outputs. | 2025-09 | 164k contextstructured outputs | Current |
| DeepSeek V3.2 Speciale | Use when the workload needs 164k context and structured outputs. | 2025-04 | 164k contextstructured outputs | Current |
| DeepSeek V3.2 Exp | Use when the workload needs 164k context, structured outputs, and code execution. | 2025-04 | 164k contextstructured outputscode execution | Current |
| Together AI Deepseek-LLM-67B-Chat | Use when the workload needs 4k context, 67B parameters, and structured outputs. | 2024-01 | 4k context67B parametersstructured outputs | Current |
| DeepSeek 67B Chat | Use when the workload needs 4k context, 67B parameters, and structured outputs. | 2023-11 | 4k context67B parametersstructured outputs | Current |
| DeepSeek 7B Chat | Use when the workload needs 4k context and 7B parameters. | 2023-11 | 4k context7B parameters | Current |
| DeepSeek 67B | Use when the workload needs 4k context and 67B parameters. | 2023-11 | 4k context67B parameters | Current |
| DeepSeek 7B | Use when the workload needs 4k context and 7B parameters. | 2023-11 | 4k context7B parameters | Current |
Release Timeline
4 release groupsSpecifications(8 models)
| Model | Released | Context | Parameters | Structured Outputs | Code Exec |
|---|---|---|---|---|---|
| DeepSeek V3.1 Terminus | 2025-09 | 164k | 671B total, 37B active (MoE) | Yes | No |
| DeepSeek V3.2 Speciale | 2025-04 | 164k | 685B total, 37B active (MoE) | Yes | No |
| DeepSeek V3.2 Exp | 2025-04 | 164k | 685B total, 37B active (MoE) | Yes | Yes |
| Together AI Deepseek-LLM-67B-Chat | 2024-01 | 4k | 67B | Yes | No |
| DeepSeek 67B Chat | 2023-11 | 4k | 67B | Yes | No |
| DeepSeek 7B Chat | 2023-11 | 4k | 7B | No | No |
| DeepSeek 67B | 2023-11 | 4k | 67B | No | No |
| DeepSeek 7B | 2023-11 | 4k | 7B | No | No |
Available From(7 providers)
Pricing
| Model | Provider | Input / 1M | Output / 1M | Type |
|---|---|---|---|---|
| DeepSeek V3.1 Terminus | OpenRouter | $0.21 | $0.79 | Serverless |
| DeepSeek V3.2 Exp | OpenRouter | $0.27 | $0.41 | Serverless |
| DeepSeek V3.1 Terminus | Vercel AI Gateway | $0.27 | $1 | Serverless |
| DeepSeek V3.2 Exp | Novita AI | $0.27 | $0.41 | Serverless |
| DeepSeek V3.1 Terminus | Novita AI | $0.27 | $1 | Serverless |
| DeepSeek V3.2 Speciale | DeepSeek Platform | $0.28 | $0.42 | Serverless |
| DeepSeek V3.2 Exp | DeepSeek Platform | $0.28 | $0.42 | Serverless |
| DeepSeek V3.2 Speciale | OpenRouter | $0.4 | $1.2 | Serverless |
| Together AI Deepseek-LLM-67B-Chat | Together AI | $0.6 | $0.6 | Serverless |
| DeepSeek 67B Chat | Together AI | $0.9 | $0.9 | Serverless |
Frequently Asked Questions
- What is DeepSeek used for?
- DeepSeek is used for structured outputs, code execution, and coding. The family description and listed model capabilities point to those workloads as the best fit.
- How does DeepSeek compare to Janus?
- DeepSeek by DeepSeek is strongest where you need structured outputs, while Janus by DeepSeek is the closest related family to check for image generation. DeepSeek has 8 listed variants and reaches up to 164k context, so compare the specs and pricing tables before choosing a production model.
- Which DeepSeek model should I use?
- For the lowest listed input price, start with DeepSeek V3.1 Terminus through OpenRouter at $0.21/1M input tokens. For the most capable/latest local choice, evaluate DeepSeek V3.2 Exp with 164k context and structured outputs.
Models(8)
DeepSeek V3.1 Terminus
DeepSeek V3.2 Speciale
DeepSeek V3.2 Exp
Together AI Deepseek-LLM-67B-Chat
DeepSeek 67B Chat
DeepSeek 7B Chat
DeepSeek 67B
DeepSeek 7B



