Generalizing Verifiable Instruction Following
- URL: http://arxiv.org/abs/2507.02833v2
- Date: Mon, 04 Aug 2025 11:54:59 GMT
- Title: Generalizing Verifiable Instruction Following
- Authors: Valentina Pyatkin, Saumya Malik, Victoria Graf, Hamish Ivison, Shengyi Huang, Pradeep Dasigi, Nathan Lambert, Hannaneh Hajishirzi,
- Abstract summary: A crucial factor for successful human and AI interaction is the ability of language models to follow human instructions precisely.<n>We find that most models strongly overfit on a small set of verifiable constraints from the benchmarks that test these abilities.<n>We introduce a new benchmark, IFBench, to evaluate precise instruction following generalization on 58 new, diverse, and challenging verifiable out-of-domain constraints.
- Score: 44.02178200187706
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A crucial factor for successful human and AI interaction is the ability of language models or chatbots to follow human instructions precisely. A common feature of instructions are output constraints like ``only answer with yes or no" or ``mention the word `abrakadabra' at least 3 times" that the user adds to craft a more useful answer. Even today's strongest models struggle with fulfilling such constraints. We find that most models strongly overfit on a small set of verifiable constraints from the benchmarks that test these abilities, a skill called precise instruction following, and are not able to generalize well to unseen output constraints. We introduce a new benchmark, IFBench, to evaluate precise instruction following generalization on 58 new, diverse, and challenging verifiable out-of-domain constraints. In addition, we perform an extensive analysis of how and on what data models can be trained to improve precise instruction following generalization. Specifically, we carefully design constraint verification modules and show that reinforcement learning with verifiable rewards (RLVR) significantly improves instruction following. In addition to IFBench, we release 29 additional new hand-annotated training constraints and verification functions, RLVR training prompts, and code.
Related papers
- IFDECORATOR: Wrapping Instruction Following Reinforcement Learning with Verifiable Rewards [22.802937805177773]
Instruct Following Decorator (IFDecorator) is a framework that wraps RLVR training into a robust and sample-efficient pipeline.<n>Our Qwen2.5-32B-Instruct-IFDecorator achieves 87.43% accuracy on IFEval, outperforming larger proprietary models such as GPT-4o.<n>Our trip wires show significant reductions in reward hacking rates.
arXiv Detail & Related papers (2025-08-06T17:00:54Z) - Checklists Are Better Than Reward Models For Aligning Language Models [99.1896531064102]
We propose "Reinforcement Learning from Checklist Feedback" (RLCF)<n>From instructions, we extract checklists and evaluate how well responses satisfy each item.<n>Using both AI judges and specialized verifier programs, we combine these scores to compute rewards for RL.
arXiv Detail & Related papers (2025-07-24T17:58:00Z) - Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models [27.142703756752997]
We introduce MathIF, a benchmark for evaluating instruction-following in mathematical reasoning tasks.<n>Our empirical analysis reveals a consistent tension between scaling up reasoning capacity and maintaining controllability.<n>We show that even simple interventions can partially recover obedience, though at the cost of reasoning performance.
arXiv Detail & Related papers (2025-05-20T18:18:01Z) - A Multi-Dimensional Constraint Framework for Evaluating and Improving Instruction Following in Large Language Models [48.361839372110246]
We develop an automated instruction generation pipeline that performs constraint expansion, conflict detection, and instruction rewriting.<n>We evaluate 19 large language models and uncover substantial variation in performance across constraint forms.<n>In-depth analysis indicates that these gains stem primarily from modifications in the model's attention modules parameters.
arXiv Detail & Related papers (2025-05-12T14:16:55Z) - LoRanPAC: Low-rank Random Features and Pre-trained Models for Bridging Theory and Practice in Continual Learning [103.45785408116146]
Continual learning (CL) aims to train a model that can solve multiple tasks presented sequentially.<n>Recent CL approaches have achieved strong performance by leveraging large pre-trained models that generalize well to downstream tasks.<n>However, such methods lack theoretical guarantees, making them prone to unexpected failures.<n>We aim to bridge this gap by designing a simple CL method that is theoretically sound and highly performant.
arXiv Detail & Related papers (2024-10-01T12:58:37Z) - From Instructions to Constraints: Language Model Alignment with
Automatic Constraint Verification [70.08146540745877]
We investigate common constraints in NLP tasks, categorize them into three classes based on the types of their arguments.
We propose a unified framework, ACT (Aligning to ConsTraints), to automatically produce supervision signals for user alignment with constraints.
arXiv Detail & Related papers (2024-03-10T22:14:54Z) - Nevermind: Instruction Override and Moderation in Large Language Models [2.0935496890864207]
We investigate and benchmark the most popular proprietary and different sized open source models on the task of explicit instruction following in conflicting situations.
We observe improving instruction following, and subsequently instruction overrides/jailbreaks, is fundamentally at odds with the ability of a language model to follow given safety filters or guidelines.
arXiv Detail & Related papers (2024-02-05T18:58:19Z) - Improving Long-Horizon Imitation Through Instruction Prediction [93.47416552953075]
In this work, we explore the use of an often unused source of auxiliary supervision: language.
Inspired by recent advances in transformer-based models, we train agents with an instruction prediction loss that encourages learning temporally extended representations that operate at a high level of abstraction.
In further analysis we find that instruction modeling is most important for tasks that require complex reasoning, while understandably offering smaller gains in environments that require simple plans.
arXiv Detail & Related papers (2023-06-21T20:47:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.