Step-by-Step Mastery: Enhancing Soft Constraint Following Ability of Large Language Models
- URL: http://arxiv.org/abs/2501.04945v3
- Date: Sun, 16 Feb 2025 23:36:29 GMT
- Title: Step-by-Step Mastery: Enhancing Soft Constraint Following Ability of Large Language Models
- Authors: Qingyu Ren, Jie Zeng, Qianyu He, Jiaqing Liang, Yanghua Xiao, Weikang Zhou, Zeye Sun, Fei Yu,
- Abstract summary: It is crucial for large language models (LLMs) to follow instructions that involve multiple constraints.
We design a pipeline to construct datasets with high-quality outputs automatically.
To fully utilize the positive and negative samples generated during the data construction process, we choose Direct Preference Optimization (DPO) as the training method.
We experimentally evaluate the effectiveness of our methods in improving LLMs' soft constraint following ability.
- Score: 39.114513139453756
- License:
- Abstract: It is crucial for large language models (LLMs) to follow instructions that involve multiple constraints. However, it is an unexplored area to enhance LLMs' ability to follow soft constraints. To bridge the gap, we initially design a pipeline to construct datasets with high-quality outputs automatically. Additionally, to fully utilize the positive and negative samples generated during the data construction process, we choose Direct Preference Optimization (DPO) as the training method. Furthermore, taking into account the difficulty of soft constraints indicated by the number of constraints, we design a curriculum learning training paradigm based on the constraint quantity. We experimentally evaluate the effectiveness of our methods in improving LLMs' soft constraint following ability and analyze the factors driving the improvements.The datasets and code are publicly available at https://github.com/Rainier-rq/FollowSoftConstraint.
Related papers
- Constraint Back-translation Improves Complex Instruction Following of Large Language Models [55.60192044049083]
Large language models (LLMs) struggle to follow instructions with complex constraints in format, length, etc.
Previous works conduct post-training on complex instruction-response pairs generated by feeding complex instructions to advanced LLMs.
We propose a novel data generation technique, constraint back-translation.
arXiv Detail & Related papers (2024-10-31T17:42:26Z) - Divide-Verify-Refine: Aligning LLM Responses with Complex Instructions [33.18076221854853]
LLMs struggle to follow complex instructions with multiple constraints.
Recent studies show that LLMs, particularly open-source models, struggle to follow complex instructions with multiple constraints.
We propose the Divide-Verify-Refine (DVR) framework with three steps.
We show that the framework significantly improves performance, doubling LLama3.1-8B's constraint adherence on instructions with 6 constraints.
arXiv Detail & Related papers (2024-10-16T04:01:55Z) - Attribute Controlled Fine-tuning for Large Language Models: A Case Study on Detoxification [76.14641982122696]
We propose a constraint learning schema for fine-tuning Large Language Models (LLMs) with attribute control.
We show that our approach leads to an LLM that produces fewer inappropriate responses while achieving competitive performance on benchmarks and a toxicity detection task.
arXiv Detail & Related papers (2024-10-07T23:38:58Z) - CoDa: Constrained Generation based Data Augmentation for Low-Resource NLP [46.95923453967386]
CoDa is a controllable, effective, and training-free data augmentation technique for low-resource (data-scarce) NLP.
Our approach is based on prompting off-the-shelf instruction-following Large Language Models.
CoDa is the first framework that provides users explicit control over the augmentation generation process.
arXiv Detail & Related papers (2024-03-30T16:47:06Z) - FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models [79.62191017182518]
FollowBench is a benchmark for Fine-grained Constraints Following Benchmark for Large Language Models.
We introduce a Multi-level mechanism that incrementally adds a single constraint to the initial instruction at each increased level.
By evaluating 13 popular LLMs on FollowBench, we highlight the weaknesses of LLMs in instruction following and point towards potential avenues for future work.
arXiv Detail & Related papers (2023-10-31T12:32:38Z) - Amortizing intractable inference in large language models [56.92471123778389]
We use amortized Bayesian inference to sample from intractable posterior distributions.
We empirically demonstrate that this distribution-matching paradigm of LLM fine-tuning can serve as an effective alternative to maximum-likelihood training.
As an important application, we interpret chain-of-thought reasoning as a latent variable modeling problem.
arXiv Detail & Related papers (2023-10-06T16:36:08Z) - From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning [52.257422715393574]
We introduce a self-guided methodology for Large Language Models (LLMs) to autonomously discern and select cherry samples from open-source datasets.
Our key innovation, the Instruction-Following Difficulty (IFD) metric, emerges as a pivotal metric to identify discrepancies between a model's expected responses and its intrinsic generation capability.
arXiv Detail & Related papers (2023-08-23T09:45:29Z) - Teaching the Old Dog New Tricks: Supervised Learning with Constraints [18.88930622054883]
Adding constraint support in Machine Learning has the potential to address outstanding issues in data-driven AI systems.
Existing approaches typically apply constrained optimization techniques to ML training, enforce constraint satisfaction by adjusting the model design, or use constraints to correct the output.
Here, we investigate a different, complementary, strategy based on "teaching" constraint satisfaction to a supervised ML method via the direct use of a state-of-the-art constraint solver.
arXiv Detail & Related papers (2020-02-25T09:47:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.