Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs
Best AI papers explained - En podcast av Enoch H. Kang

Kategorier:
This paper assesses how well Large Language Models (LLMs) can infer, remember, and follow user preferences in long, multi-session conversations. The evaluation of 10 different LLMs using this benchmark revealed that current state-of-the-art models exhibit significant difficulty proactively following user preferences, with accuracy dropping below 10% in zero-shot settings within a short number of turns. The researchers conclude that while fine-tuning on PrefEval can improve results, the benchmark demonstrates LLMs still face challenges in personalized conversational abilities.