Forem

# quantization

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
Why your quantized LLM loses its MTP heads and how to keep them
Cover image for Why your quantized LLM loses its MTP heads and how to keep them

Why your quantized LLM loses its MTP heads and how to keep them

Comments
5 min read
The Best Result This Week Was a Failed Prediction — Phase-3a Doesn't Transfer

The Best Result This Week Was a Failed Prediction — Phase-3a Doesn't Transfer

Comments
1 min read
Two Localizers, Both Wrong: Bounding a Quantization Cost That Wouldn't Close

Two Localizers, Both Wrong: Bounding a Quantization Cost That Wouldn't Close

Comments
1 min read
When the Sensitivity Metric Lies: A Drift-Inversion Smoking Gun in Mixed-Precision LLM Quantization

When the Sensitivity Metric Lies: A Drift-Inversion Smoking Gun in Mixed-Precision LLM Quantization

Comments
8 min read
GGUF Quantization Explained: Q4_K_M vs Q5_K_M vs Q8 — Which to Pick (2026)

GGUF Quantization Explained: Q4_K_M vs Q5_K_M vs Q8 — Which to Pick (2026)

Comments
4 min read
1-bit, 545 megabytes, zero API keys — local AI that beats GPT-5.4

1-bit, 545 megabytes, zero API keys — local AI that beats GPT-5.4

2
Comments 1
2 min read
KVQuant: Run 70B LLMs on 8GB RAM with KV Cache Quantization

KVQuant: Run 70B LLMs on 8GB RAM with KV Cache Quantization

Comments
1 min read
KVQuant: Run 70B LLMs on 8GB RAM with 4-bit KV Cache Quantization

KVQuant: Run 70B LLMs on 8GB RAM with 4-bit KV Cache Quantization

Comments
1 min read
Traditional Quantization vs 1.58-Bit Ternary Models: A Practical Comparison
Cover image for Traditional Quantization vs 1.58-Bit Ternary Models: A Practical Comparison

Traditional Quantization vs 1.58-Bit Ternary Models: A Practical Comparison

Comments 1
5 min read
GIMP's Posterization: Simple Quantization vs. Median Cut for Better Visuals

GIMP's Posterization: Simple Quantization vs. Median Cut for Better Visuals

Comments
8 min read
Q4 KV Cache Fit 32K Context into 8GB VRAM — Only Math Broke

Q4 KV Cache Fit 32K Context into 8GB VRAM — Only Math Broke

Comments
8 min read
Postmortem: How a Quantization Error in Llama 3.2 7B Caused Incorrect Code Suggestions for 500 Users

Postmortem: How a Quantization Error in Llama 3.2 7B Caused Incorrect Code Suggestions for 500 Users

Comments
13 min read
Chasing 16MB: My Parameter Golf Journey and What I Learned the Hard Way

Chasing 16MB: My Parameter Golf Journey and What I Learned the Hard Way

1
Comments
3 min read
Building a Vector Database That Never Decompresses Your Vectors

Building a Vector Database That Never Decompresses Your Vectors

2
Comments
16 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.