Related papers: Step-by-Step Mastery: Enhancing Soft Constraint Following Ability of Large Language Models

Step-by-Step Mastery: Enhancing Soft Constraint Following Ability of Large Language Models

URL: http://arxiv.org/abs/2501.04945v3
Date: Sun, 16 Feb 2025 23:36:29 GMT
Title: Step-by-Step Mastery: Enhancing Soft Constraint Following Ability of Large Language Models
Authors: Qingyu Ren, Jie Zeng, Qianyu He, Jiaqing Liang, Yanghua Xiao, Weikang Zhou, Zeye Sun, Fei Yu,
Abstract summary: It is crucial for large language models (LLMs) to follow instructions that involve multiple constraints.<n>We design a pipeline to construct datasets with high-quality outputs automatically.<n>To fully utilize the positive and negative samples generated during the data construction process, we choose Direct Preference Optimization (DPO) as the training method.<n>We experimentally evaluate the effectiveness of our methods in improving LLMs' soft constraint following ability.
Score: 39.114513139453756
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: It is crucial for large language models (LLMs) to follow instructions that involve multiple constraints. However, it is an unexplored area to enhance LLMs' ability to follow soft constraints. To bridge the gap, we initially design a pipeline to construct datasets with high-quality outputs automatically. Additionally, to fully utilize the positive and negative samples generated during the data construction process, we choose Direct Preference Optimization (DPO) as the training method. Furthermore, taking into account the difficulty of soft constraints indicated by the number of constraints, we design a curriculum learning training paradigm based on the constraint quantity. We experimentally evaluate the effectiveness of our methods in improving LLMs' soft constraint following ability and analyze the factors driving the improvements.The datasets and code are publicly available at https://github.com/Rainier-rq/FollowSoftConstraint.

Related papers

Towards Efficient and Effective Alignment of Large Language Models [7.853945494882636]
Large language models (LLMs) exhibit remarkable capabilities across diverse tasks, yet aligning them efficiently and effectively with human expectations remains a critical challenge.<n>This thesis advances LLM alignment by introducing novel methodologies in data collection, training, and evaluation.
arXiv Detail & Related papers (2025-06-11T02:08:52Z)
Constraint Back-translation Improves Complex Instruction Following of Large Language Models [55.60192044049083]
Large language models (LLMs) struggle to follow instructions with complex constraints in format, length, etc. Previous works conduct post-training on complex instruction-response pairs generated by feeding complex instructions to advanced LLMs. We propose a novel data generation technique, constraint back-translation.
arXiv Detail & Related papers (2024-10-31T17:42:26Z)
Divide-Verify-Refine: Aligning LLM Responses with Complex Instructions [33.18076221854853]
LLMs struggle to follow complex instructions with multiple constraints. Recent studies show that LLMs, particularly open-source models, struggle to follow complex instructions with multiple constraints. We propose the Divide-Verify-Refine (DVR) framework with three steps. We show that the framework significantly improves performance, doubling LLama3.1-8B's constraint adherence on instructions with 6 constraints.
arXiv Detail & Related papers (2024-10-16T04:01:55Z)
LLM Self-Correction with DeCRIM: Decompose, Critique, and Refine for Enhanced Following of Instructions with Multiple Constraints [86.59857711385833]
We introduce RealInstruct, the first benchmark designed to evaluate LLMs' ability to follow real-world multi-constrained instructions. To address the performance gap between open-source and proprietary models, we propose the Decompose, Critique and Refine (DeCRIM) self-correction pipeline. Our results show that DeCRIM improves Mistral's performance by 7.3% on RealInstruct and 8.0% on IFEval even with weak feedback.
arXiv Detail & Related papers (2024-10-09T01:25:10Z)
Attribute Controlled Fine-tuning for Large Language Models: A Case Study on Detoxification [76.14641982122696]
We propose a constraint learning schema for fine-tuning Large Language Models (LLMs) with attribute control. We show that our approach leads to an LLM that produces fewer inappropriate responses while achieving competitive performance on benchmarks and a toxicity detection task.
arXiv Detail & Related papers (2024-10-07T23:38:58Z)
Soft Prompting for Unlearning in Large Language Models [11.504012974208466]
This work focuses on investigating machine unlearning for Large Language Models motivated by data protection regulations. We propose a framework textbfSoft textbfPrompting for textbfUntextbflearning (SPUL) We conduct a rigorous evaluation of the proposed method and our results indicate that SPUL can significantly improve the trade-off between utility and forgetting.
arXiv Detail & Related papers (2024-06-17T19:11:40Z)
Conifer: Improving Complex Constrained Instruction-Following Ability of Large Language Models [23.17547206140014]
We introduce Conifer, an instruction tuning dataset for large language models. We train models with Conifer to follow instructions with complex constraints. On several instruction-following benchmarks, our 7B model outperforms the state-of-the-art open-source 7B models.
arXiv Detail & Related papers (2024-04-03T15:55:39Z)
CoDa: Constrained Generation based Data Augmentation for Low-Resource NLP [46.95923453967386]
CoDa is a controllable, effective, and training-free data augmentation technique for low-resource (data-scarce) NLP. Our approach is based on prompting off-the-shelf instruction-following Large Language Models. CoDa is the first framework that provides users explicit control over the augmentation generation process.
arXiv Detail & Related papers (2024-03-30T16:47:06Z)
FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models [79.62191017182518]
FollowBench is a benchmark for Fine-grained Constraints Following Benchmark for Large Language Models. We introduce a Multi-level mechanism that incrementally adds a single constraint to the initial instruction at each increased level. By evaluating 13 popular LLMs on FollowBench, we highlight the weaknesses of LLMs in instruction following and point towards potential avenues for future work.
arXiv Detail & Related papers (2023-10-31T12:32:38Z)
Amortizing intractable inference in large language models [56.92471123778389]
We use amortized Bayesian inference to sample from intractable posterior distributions. We empirically demonstrate that this distribution-matching paradigm of LLM fine-tuning can serve as an effective alternative to maximum-likelihood training. As an important application, we interpret chain-of-thought reasoning as a latent variable modeling problem.
arXiv Detail & Related papers (2023-10-06T16:36:08Z)
From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning [52.257422715393574]
We introduce a self-guided methodology for Large Language Models (LLMs) to autonomously discern and select cherry samples from open-source datasets. Our key innovation, the Instruction-Following Difficulty (IFD) metric, emerges as a pivotal metric to identify discrepancies between a model's expected responses and its intrinsic generation capability.
arXiv Detail & Related papers (2023-08-23T09:45:29Z)
Teaching the Old Dog New Tricks: Supervised Learning with Constraints [18.88930622054883]
Adding constraint support in Machine Learning has the potential to address outstanding issues in data-driven AI systems. Existing approaches typically apply constrained optimization techniques to ML training, enforce constraint satisfaction by adjusting the model design, or use constraints to correct the output. Here, we investigate a different, complementary, strategy based on "teaching" constraint satisfaction to a supervised ML method via the direct use of a state-of-the-art constraint solver.
arXiv Detail & Related papers (2020-02-25T09:47:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.