Sale!

Quantization and Fast Inference

Original price was: ₹5,819.00.Current price is: ₹4,364.00.

PRE -ORDER NOW

 

SKU: 9781633433915 Category: Brand:

Description

Today’s AI models demand a lot of memory, compute, and server horsepower—which quickly translates into cost. Quantization and Fast Inference show you how you can optimize AI models without architectural redesigns or task-specific compression. It reveals practical techniques for quantization, systematically reducing numerical precision to achieve faster inference, lower memory usage, and cheaper deployment—all with minimal accuracy loss.

From quantization fundamentals to runtime packaging, the book gives you a complete and comprehensive overview of the full quantization pipeline. It starts by deriving quantization mapping from first principles, and then builds your knowledge and skill through techniques for production-tested PTQ and QAT workflows and a fully compressed deployment. You’ll learn to apply post-training quantization to production models, run quantization-aware training using fake quantization and straight-through estimators, and handle subtle tradeoffs like activation outliers in LLMs, KV cache pressure, and sub-8-bit formats like NF4 and FP4.

Additional information

Weight 0.5 kg
Dimensions 11 × 11 × 11 cm
Shipping Time

Pre-Order Now

Reviews

There are no reviews yet

Only logged in customers who have purchased this product may leave a review.