Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF

Best AI papers explained - En podcast av Enoch H. Kang

Kategorier:

This paper investigate two major drawbacks in the reward learning phase of RLHF: reward overfitting and reward overoptimization, which often occur because the standard cross-entropy loss is inadequate for imbalanced preference datasets. To address these issues, the paper introduces a novel algorithm called Iterative Data Smoothing (IDS), which mitigates these problems by iteratively updating hard comparison labels with softer, model-predicted labels during training. Theoretical analysis and empirical results in both multi-armed bandit and neural network settings demonstrate that IDS outperforms traditional Maximum Likelihood Estimation (MLE), offering a more robust approach to reward training.

Visit the podcast's native language site