LLM Reference

DeepSeek VL Models by DeepSeek

DeepSeekDeepSeek LicenseOpen weights
4 models2024From $0.05/1M input

Details

ResearcherDeepSeek
Commercial useCommercial use allowed
Models4
Released2024

Capabilities

VisionAll models
MultimodalAll models

About

DeepSeek-VL is an advanced open-source family of vision-language models crafted for real-world applications, offering 1.3B and 7B parameter sizes with both "base" and "chat" variants. A standout feature is its hybrid vision encoder, which efficiently handles 1024 x 1024 high-resolution images, balancing performance with low computational needs. The models prioritize robust language abilities by integrating vision-language data strategically during training, preventing any compromise on language performance. With a vast pretraining dataset sourced from Common Crawl, web code, e-books, and educational content, DeepSeek-VL achieves competitive or state-of-the-art results across various benchmarks. These models aim to bridge the open-source and closed-source performance gap, enhancing both user experience and real-world applicability, and are available on platforms like Hugging Face for easy access.

Current Variants

Use-when guidance is derived from seed capabilities, context, release, and replacement fields.

4 in view

Use when the workload needs 7B parameters and multimodal inputs.

2024-037B parametersmultimodal inputs

Use when the workload needs 1.3B parameters and multimodal inputs.

2024-031.3B parametersmultimodal inputs

Use when the workload needs 7B parameters and multimodal inputs.

2024-037B parametersmultimodal inputs

Use when the workload needs 1.3B parameters and multimodal inputs.

2024-031.3B parametersmultimodal inputs

Release Timeline

1 release group
2024-03
4 current
DeepSeek VL 1.3B
1.3B parametersmultimodal inputs
Current
DeepSeek VL 1.3B Chat
1.3B parametersmultimodal inputs
Current
DeepSeek VL 7B
7B parametersmultimodal inputs
Current
DeepSeek VL 7B Chat
7B parametersmultimodal inputs
Current

Specifications(4 models)

DeepSeek VL model specifications comparison
ModelReleasedParametersVisionMultimodal
DeepSeek VL 7B2024-037BYesYes
DeepSeek VL 1.3B2024-031.3BYesYes
DeepSeek VL 7B Chat2024-037BYesYes
DeepSeek VL 1.3B Chat2024-031.3BYesYes

Available From(1 provider)

Pricing

DeepSeek VL model pricing by provider
ModelProviderInput / 1MOutput / 1MType
DeepSeek VL 7BReplicate API$0.05$0.25Serverless

Frequently Asked Questions

What is DeepSeek VL used for?
DeepSeek VL is used for vision and multimodal work and coding. The family description and listed model capabilities point to those workloads as the best fit.
How does DeepSeek VL compare to Janus?
DeepSeek VL by DeepSeek is strongest where you need vision and multimodal work, while Janus by DeepSeek is the closest related family to check for image generation. DeepSeek VL has 4 listed variants, so compare the specs and pricing tables before choosing a production model.
Which DeepSeek VL model should I use?
For the lowest listed input price, start with DeepSeek VL 7B through Replicate API at $0.05/1M input tokens. For the most capable/latest local choice, evaluate DeepSeek VL 7B with multimodal inputs.