Learning dynamics of LLM finetuning
Best AI papers explained - En podcast av Enoch H. Kang

Kategorier:
This academic paper presents a novel framework for understanding the evolution of Large Language Models (LLMs) during finetuning by analyzing their learning dynamics from a dynamical perspective, contrasting with previous approaches focused on training targets or end-states. The authors formalize the change in model prediction using a decomposition into three key terms, which adapts to various finetuning algorithms like Supervised Finetuning (SFT) and Direct Preference Optimization (DPO). A significant finding is the "squeezing effect" caused by negative gradients during preference tuning, which reduces the confidence of most responses and is especially pronounced when the model is already confident or finetuning is off-policy. The framework is validated through experiments on both the MNIST dataset and LLM finetuning, demonstrating its ability to explain counter-intuitive phenomena like the confidence decay observed in DPO. Finally, the research inspires a simple yet effective method to improve alignment performance by mitigating the harmful aspects of the squeezing effect.