🔬
arXiv.org
arxiv.org › papers › neural-optimization-2024
Neural Network Optimization Techniques — 2024 Survey
📌 Yui is actively reading this page · 62% complete · Extracting: gradient descent variants, quantization strategies
Abstract
Comprehensive survey of optimization methods for large neural networks. Covers gradient descent variants, memory-efficient training, quantization strategies, and inference acceleration techniques. This paper synthesizes findings from 847 peer-reviewed publications between 2021–2024.
1. Gradient Descent Variants
Modern optimization has converged on adaptive methods. Adam and its variants (AdamW, Adan, Lion) dominate the landscape for transformer training. The key insight is that second-moment estimates of gradient variance allow per-parameter learning rates.
optimizer = torch.optim.AdamW(
model.parameters(),
lr=1e-4, weight_decay=0.01
)
2. Quantization Strategies
Post-training quantization (PTQ) allows reducing model precision from FP32 to INT8 or even INT4 with minimal accuracy loss. GPTQ and AWQ have emerged as the dominant PTQ approaches for LLMs, achieving 4x memory reduction with less than 1% accuracy degradation on standard benchmarks.