Related papers: RECAST: Strengthening LLMs' Complex Instruction Following with Constraint-Verifiable Data

RECAST: Strengthening LLMs' Complex Instruction Following with Constraint-Verifiable Data

URL: http://arxiv.org/abs/2505.19030v2
Date: Tue, 27 May 2025 07:53:33 GMT
Title: RECAST: Strengthening LLMs' Complex Instruction Following with Constraint-Verifiable Data
Authors: Wenhao Liu, Zhengkang Guo, Mingchen Xie, Jingwen Xu, Zisu Huang, Muzhao Tian, Jianhan Xu, Muling Wu, Xiaohua Wang, Changze Lv, He-Da Wang, Hu Yao, Xiaoqing Zheng, Xuanjing Huang,
Abstract summary: RECAST is a novel framework for synthesizing datasets where each example incorporates far more constraints than those in existing benchmarks.<n>We construct RECAST-30K, a large-scale, high-quality dataset comprising 30k instances spanning 15 constraint types.<n> Experimental results demonstrate that models fine-tuned on RECAST-30K show substantial improvements in following complex instructions.
Score: 37.631782007066214
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) are increasingly expected to tackle complex tasks, driven by their expanding applications and users' growing proficiency in crafting sophisticated prompts. However, as the number of explicitly stated requirements increases (particularly more than 10 constraints), LLMs often struggle to accurately follow such complex instructions. To address this challenge, we propose RECAST, a novel framework for synthesizing datasets where each example incorporates far more constraints than those in existing benchmarks. These constraints are extracted from real-world prompt-response pairs to ensure practical relevance. RECAST enables automatic verification of constraint satisfaction via rule-based validators for quantitative constraints and LLM-based validators for qualitative ones. Using this framework, we construct RECAST-30K, a large-scale, high-quality dataset comprising 30k instances spanning 15 constraint types. Experimental results demonstrate that models fine-tuned on RECAST-30K show substantial improvements in following complex instructions. Moreover, the verifiability provided by RECAST enables the design of reward functions for reinforcement learning, which further boosts model performance on complex and challenging tasks.

Related papers

Light-IF: Endowing LLMs with Generalizable Reasoning via Preview and Self-Checking for Complex Instruction Following [10.119219532863767]
lazy reasoning during the thinking stage is the primary factor contributing to poor instruction adherence.<n>We propose a comprehensive framework designed to enable rigorous reasoning processes involving preview and self-checking.<n>Our Light-IF-32B model surpasses both larger open-source models such as DeepSeek-R1 and closed-source models like Doubao-1.6.
arXiv Detail & Related papers (2025-08-05T07:42:00Z)
EIFBENCH: Extremely Complex Instruction Following Benchmark for Large Language Models [65.48902212293903]
We present the Extremely Complex Instruction Following Benchmark (EIFBENCH) for evaluating large language models (LLMs)<n>EIFBENCH includes multi-task scenarios that enable comprehensive assessment across diverse task types concurrently.<n>We also propose the Segment Policy Optimization (SegPO) algorithm to enhance the LLM's ability to accurately fulfill multi-task workflow.
arXiv Detail & Related papers (2025-06-10T02:39:55Z)
LLM-Symbolic Integration for Robust Temporal Tabular Reasoning [69.27153114778748]
We introduce TempTabQA-C, a synthetic dataset designed for systematic and controlled evaluations.<n>This structured approach allows Large Language Models (LLMs) to generate and executesql queries, enhancing generalization and mitigating biases.
arXiv Detail & Related papers (2025-06-06T05:14:04Z)
A Multi-Dimensional Constraint Framework for Evaluating and Improving Instruction Following in Large Language Models [48.361839372110246]
We develop an automated instruction generation pipeline that performs constraint expansion, conflict detection, and instruction rewriting.<n>We evaluate 19 large language models and uncover substantial variation in performance across constraint forms.<n>In-depth analysis indicates that these gains stem primarily from modifications in the model's attention modules parameters.
arXiv Detail & Related papers (2025-05-12T14:16:55Z)
Constraint Back-translation Improves Complex Instruction Following of Large Language Models [55.60192044049083]
Large language models (LLMs) struggle to follow instructions with complex constraints in format, length, etc.<n>Previous works conduct post-training on complex instruction-response pairs generated by feeding complex instructions to advanced LLMs.<n>We propose a novel data generation technique, constraint back-translation.
arXiv Detail & Related papers (2024-10-31T17:42:26Z)
Divide-Verify-Refine: Can LLMs Self-Align with Complex Instructions? [33.18076221854853]
We propose a framework to divide complex instructions into single constraints and prepare appropriate tools.<n>We then verify responses using tools that provide rigorous check and textual guidance.<n>To maximize refinement effectiveness, we propose dynamic few-shot prompting, where a refinement repository collects successful refinements.
arXiv Detail & Related papers (2024-10-16T04:01:55Z)
The Ability of Large Language Models to Evaluate Constraint-satisfaction in Agent Responses to Open-ended Requests [0.6249768559720121]
We develop and release a novel Arithmetic Constraint-Satisfaction (ACS) benchmarking dataset. This dataset consists of complex user requests with corresponding constraints, agent responses and human labels indicating each constraint's satisfaction level in the response. We show that most models still have a significant headroom for improvement, and that errors primarily stem from reasoning issues.
arXiv Detail & Related papers (2024-09-22T09:27:42Z)
Conifer: Improving Complex Constrained Instruction-Following Ability of Large Language Models [23.17547206140014]
We introduce Conifer, an instruction tuning dataset for large language models. We train models with Conifer to follow instructions with complex constraints. On several instruction-following benchmarks, our 7B model outperforms the state-of-the-art open-source 7B models.
arXiv Detail & Related papers (2024-04-03T15:55:39Z)
Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity [59.57065228857247]
Retrieval-augmented Large Language Models (LLMs) have emerged as a promising approach to enhancing response accuracy in several tasks, such as Question-Answering (QA) We propose a novel adaptive QA framework, that can dynamically select the most suitable strategy for (retrieval-augmented) LLMs based on the query complexity. We validate our model on a set of open-domain QA datasets, covering multiple query complexities, and show that ours enhances the overall efficiency and accuracy of QA systems.
arXiv Detail & Related papers (2024-03-21T13:52:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.