Prompt Curriculum Learning for Efficient LLM Post-Training

Best AI papers explained - En podcast av Enoch H. Kang

Kategorier:

This paper Prompt Curriculum Learning (PCL), a novel and efficient reinforcement learning (RL) algorithm for post-training large language models (LLMs), particularly for reasoning tasks. The research first conducts a systematic investigation, finding that the optimal training batch size occurs at the transition point between sublinear and linear generation-time scaling and that prompts of intermediate difficulty (with a $\sim$50% success rate) yield the highest training efficiency and gradient quality. PCL leverages these findings by utilizing a concurrently updated value model to identify these intermediate-difficulty prompts, thus avoiding the costly rollouts required by prior filtering methods and achieving significantly faster training times, notably 12.1x and 16.9x faster in prompt identification on two benchmarks. Empirical results demonstrate that PCL consistently achieves high performance with less training time compared to existing baselines while progressively focusing on harder prompts as the model improves.

Visit the podcast's native language site