LLM ReferenceLLM Reference

DeepSeek MoE

2 models2024Up to 4K ctx

About

The DeepSeek MoE family represents a groundbreaking series of large language models that leverage the Mixture-of-Experts (MoE) architecture to optimize computational efficiency and performance. By selectively activating a subset of parameters, known as "experts," these models minimize computational demands while maintaining robust capabilities. They utilize strategies like fine-grained expert segmentation, allowing specialized unit divisions, and shared expert isolation, which reduces redundancy in common knowledge experts. The lineup includes models such as the DeepSeekMoE 16B, which matches the performance of LLaMA2 7B with significantly less computation, and the DeepSeek-V2 family, incorporating Multi-head Latent Attention (MLA) for enhanced inference efficiency. Available on platforms like Hugging Face and GitHub, DeepSeek models span a range from the extensive DeepSeek-V2, with 236 billion parameters, to the more compact DeepSeek-V2-Lite, facilitating open-source research and development.

Specifications(2 models)

DeepSeek MoE model specifications comparison
ModelReleasedContextParameters
DeepSeek MoE 16B2024-014K16B
DeepSeek MoE 16B Chat2024-0116B

Frequently Asked Questions

What is DeepSeek MoE?
The DeepSeek MoE family represents a groundbreaking series of large language models that leverage the Mixture-of-Experts (MoE) architecture to optimize computational efficiency and performance. By selectively activating a subset of parameters, known as "experts," these models minimize computational demands while maintaining robust capabilities. They utilize strategies like fine-grained expert segmentation, allowing specialized unit divisions, and shared expert isolation, which reduces redundancy in common knowledge experts. The lineup includes models such as the DeepSeekMoE 16B, which matches the performance of LLaMA2 7B with significantly less computation, and the DeepSeek-V2 family, incorporating Multi-head Latent Attention (MLA) for enhanced inference efficiency. Available on platforms like Hugging Face and GitHub, DeepSeek models span a range from the extensive DeepSeek-V2, with 236 billion parameters, to the more compact DeepSeek-V2-Lite, facilitating open-source research and development.
How many models are in the DeepSeek MoE family?
The DeepSeek MoE family contains 2 models.
What is the latest DeepSeek MoE model?
The latest model is DeepSeek MoE 16B, released in 2024-01.

Models(2)