DeepSeek 67B
DeepSeek 67B has model metadata, but missing tracked provider pricing keeps it from being a default production pick.
Use it for
- Teams evaluating general LLM work
- Workloads that can use a 4k context window
Do not use it for
- Cost-sensitive launches that need sourced token pricing
- Vision or document-understanding workloads
- Strict JSON or tool-calling flows
- Family
- DeepSeek
- Released
- 2023-11-29
- Context
- 4k
- Parameters
- 67B
- Architecture
- Decoder Only
- Knowledge cutoff
- 2023-05
- Specialization
- general
- Training
- finetuned
- Fine-tuning
- base
About
DeepSeek LLM 67B is a sophisticated large language model featuring 67 billion parameters, trained on a vast dataset of 2 trillion tokens in English and Chinese. Building upon the LLaMA architecture, it integrates Grouped-Query Attention (GQA) to enhance computational efficiency. This model excels in tasks like reasoning, coding, mathematics, and Chinese comprehension, surpassing similar-sized options like Llama2 70B in various benchmarks. Its "chat" variant notably achieves a 73.78% pass rate on the HumanEval coding benchmark and performs well on mathematical datasets such as GSM8K. Open-source in nature, DeepSeek 67B supports both research and commercial endeavors, while recognizing common LLM limitations, including biases from training data and possible hallucinations.
DeepSeek 67B is an open-source model in the DeepSeek family. The structured metadata tracks a 4k-token context window. No headline benchmark score is tracked for DeepSeek 67B yet.
Top use-case fit
No primary decision-task fit is mapped for this model yet.
Provider price ladder
No tracked provider token pricing is available for this model yet.
Capabilities
No model capability flags are currently sourced.
Benchmark peer barsfor Coding
No task-mapped benchmark peers are available for this model yet.
Migration checks
No linked migration route is available for this model yet.