Using DeepSeek V4 Flash on Novita AI

Implementation guide · DeepSeek V4 · DeepSeek

ServerlessOpen Source

Quick Start

1
Create an account at Novita AI and generate an API key.
2
Use the Novita AI SDK or REST API to call deepseek-v4-flash.
3
You'll be billed $0.14/1M input, $0.28/1M output tokens. See full pricing.

API Portal Pricing Model Card

Code Examples

Code examples for this provider have not been sourced yet.

About Novita AI

Novita AI offers a GPU-based inference API for image, video, and language model generation with a broad catalog of open-source models.

View all models on Novita AI →

Pricing on Novita AI

Type	Price (per 1M)
Input tokens	$0.14
Output tokens	$0.28

Capabilities

ReasoningFunction CallingTool UseStructured OutputsPrompt Caching

About DeepSeek V4 Flash

DeepSeek V4 Flash is a 284B parameter (13B activated) Mixture-of-Experts language model with 1M-token context. Features a hybrid attention architecture combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) for efficient long-context inference. Supports thinking and non-thinking modes. Legacy API aliases deepseek-chat and deepseek-reasoner map to this model's non-thinking and thinking modes respectively. Pricing: $0.14/1M input, $0.28/1M output (cache hit: $0.0028/1M input). MIT licensed.

Full model details →

Model Specs

Released2026-04-24

Parameters284B

Context1m

ArchitectureMixture of Experts