Forem Core

Super Jarvis
Super Jarvis

Posted on

DeepSeek V4 Size: Parameters, Active Parameters, and Context Guide

DeepSeek V4 Size: Parameters, Active Parameters, and Context

DeepSeek V4 size is easiest to understand by separating total parameters, active parameters, and context length.

DeepSeek V4 model size and context illustration

The useful distinction is total capacity versus active inference cost: MoE scale lets a model be large without activating every parameter for every token.

Official model sizes

Model Total parameters Active parameters Context
DeepSeek V4 Flash 284B 13B 1M tokens
DeepSeek V4 Pro 1.6T 49B 1M tokens

Sources: DeepSeek-V4-Pro model card and DeepSeek API pricing.

What active parameters mean

DeepSeek V4 is an MoE family, so total parameters and active parameters are different. Total parameters describe the full model capacity. Active parameters describe the approximate amount used per token during inference.

This is why Flash can be much cheaper while still remaining useful: it has fewer active parameters and lower token prices.

Why 1M context matters

A 1M context window changes product design. Instead of sending only the last few messages, you can include large documents, long project histories, logs, or source files. The tradeoff is cost and latency, so context should still be curated rather than dumped blindly.


Source article: Read the original post

Homepage: Visit the site

Model pages:

Top comments (0)