Official code for the Manning book on structural LLM optimization: depth/width pruning, knowledge distillation, and attention optimization, runnable on free Colab GPUs.
quantization fine-tuning ai-fairness model-optimization large-language-models llm qlora attention-optimization mode-compression knowledge-distilation width-pruning
-
Updated
Jun 20, 2026 - Jupyter Notebook