DeepSeek V4 Size: Parameters, Active Parameters, and Context Guide

DeepSeek V4 Size: Parameters, Active Parameters, and Context

DeepSeek V4 size is easiest to understand by separating total parameters, active parameters, and context length.

The useful distinction is total capacity versus active inference cost: MoE scale lets a model be large without activating every parameter for every token.

Official model sizes

Model	Total parameters	Active parameters	Context
DeepSeek V4 Flash	284B	13B	1M tokens
DeepSeek V4 Pro	1.6T	49B	1M tokens

Sources: DeepSeek-V4-Pro model card and DeepSeek API pricing.

What active parameters mean

DeepSeek V4 is an MoE family, so total parameters and active parameters are different. Total parameters describe the full model capacity. Active parameters describe the approximate amount used per token during inference.

This is why Flash can be much cheaper while still remaining useful: it has fewer active parameters and lower token prices.

Why 1M context matters

A 1M context window changes product design. Instead of sending only the last few messages, you can include large documents, long project histories, logs, or source files. The tradeoff is cost and latency, so context should still be curated rather than dumped blindly.

Source article: Read the original post

Homepage: Visit the site

Model pages: