Checklists Are Better Than Reward Models For Aligning Language Models
- URL: http://arxiv.org/abs/2507.18624v1
- Date: Thu, 24 Jul 2025 17:58:00 GMT
- Title: Checklists Are Better Than Reward Models For Aligning Language Models
- Authors: Vijay Viswanathan, Yanchao Sun, Shuang Ma, Xiang Kong, Meng Cao, Graham Neubig, Tongshuang Wu,
- Abstract summary: We propose "Reinforcement Learning from Checklist Feedback" (RLCF)<n>From instructions, we extract checklists and evaluate how well responses satisfy each item.<n>Using both AI judges and specialized verifier programs, we combine these scores to compute rewards for RL.
- Score: 99.1896531064102
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Language models must be adapted to understand and follow user instructions. Reinforcement learning is widely used to facilitate this -- typically using fixed criteria such as "helpfulness" and "harmfulness". In our work, we instead propose using flexible, instruction-specific criteria as a means of broadening the impact that reinforcement learning can have in eliciting instruction following. We propose "Reinforcement Learning from Checklist Feedback" (RLCF). From instructions, we extract checklists and evaluate how well responses satisfy each item - using both AI judges and specialized verifier programs - then combine these scores to compute rewards for RL. We compare RLCF with other alignment methods applied to a strong instruction following model (Qwen2.5-7B-Instruct) on five widely-studied benchmarks -- RLCF is the only method to improve performance on every benchmark, including a 4-point boost in hard satisfaction rate on FollowBench, a 6-point increase on InFoBench, and a 3-point rise in win rate on Arena-Hard. These results establish checklist feedback as a key tool for improving language models' support of queries that express a multitude of needs.
Related papers
- VerIF: Verification Engineering for Reinforcement Learning in Instruction Following [55.60192044049083]
Reinforcement learning with verifiable rewards (RLVR) has become a key technique for enhancing large language models (LLMs)<n>We propose VerIF, a verification method that combines rule-based code verification with LLM-based verification from a large reasoning model.<n>We apply RL training with VerIF to two models, achieving significant improvements across several representative instruction-following benchmarks.
arXiv Detail & Related papers (2025-06-11T17:10:36Z) - Towards Better Instruction Following Retrieval Models [30.99867106106421]
We introduce InF-IR, a large-scale, high-quality training corpus tailored for enhancing retrieval models in Instruction-Following IR.<n>InF-IR expands traditional training pairs into over 38,000 expressive instruction, query, passage> triplets as positive samples.<n>We generate two additional hard negative examples by poisoning both instructions and queries, then rigorously validated by an advanced reasoning model (o3-mini) to ensure semantic plausibility while maintaining instructional incorrectness.
arXiv Detail & Related papers (2025-05-27T17:14:37Z) - REARANK: Reasoning Re-ranking Agent via Reinforcement Learning [69.8397511935806]
We present REARANK, a large language model (LLM)-based listwise reasoning reranking agent.<n>REARANK explicitly reasons before reranking, significantly improving both performance and interpretability.
arXiv Detail & Related papers (2025-05-26T14:31:48Z) - SeRL: Self-Play Reinforcement Learning for Large Language Models with Limited Data [65.56911325914582]
We propose Self-play Reinforcement Learning (SeRL) to bootstrap Large Language Models (LLMs) training with limited initial data.<n>The proposed SeRL yields results superior to its counterparts and achieves performance on par with those obtained by high-quality data with verifiable rewards.
arXiv Detail & Related papers (2025-05-25T13:28:04Z) - A Critical Evaluation of AI Feedback for Aligning Large Language Models [60.42291111149438]
We show that simple supervised fine-tuning with GPT-4 as the teacher outperforms existing RLAIF pipelines.
More generally, we find that the gains from RLAIF vary substantially across base model families, test-time evaluation protocols, and critic models.
arXiv Detail & Related papers (2024-02-19T18:53:54Z) - The Wisdom of Hindsight Makes Language Models Better Instruction
Followers [84.9120606803906]
Reinforcement learning has seen wide success in finetuning large language models to better align with instructions via human feedback.
In this paper, we consider an alternative approach: converting feedback to instruction by relabeling the original one and training the model for better alignment in a supervised manner.
We propose Hindsight Instruction Relabeling (HIR), a novel algorithm for aligning language models with instructions.
arXiv Detail & Related papers (2023-02-10T12:16:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.