RuleR: Improving LLM Controllability by Rule-based Data Recycling
- URL: http://arxiv.org/abs/2406.15938v4
- Date: Sat, 15 Feb 2025 20:33:41 GMT
- Title: RuleR: Improving LLM Controllability by Rule-based Data Recycling
- Authors: Ming Li, Han Chen, Chenguang Wang, Dang Nguyen, Dianqi Li, Tianyi Zhou,
- Abstract summary: Rule-based Data Recycling (RuleR) is a data augmentation method incorporating multiple constraints into the original data samples according to predefined rules.<n>Instead of creating new data from scratch, RuleR "recycles" existing data by simply applying rule-based edits to their responses and appending the rule-instructions in their original instructions.<n> Experimental results demonstrate RuleR's effectiveness in improving LLM controllability while maintaining general instruction-following capabilities.
- Score: 28.74786215922553
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) still lack delicate controllability over their responses, which is critical to enhancing their performance and the user experience. However, curating supervised fine-tuning (SFT) datasets to improve LLM controllability usually relies on human experts or proprietary LLMs, which requires additional costs. To bridge this gap, we propose Rule-based Data Recycling (RuleR), a data augmentation method incorporating multiple constraints into the original data samples according to predefined rules, which creates new training tasks to consolidate the controllability of LLMs. Instead of creating new data from scratch, RuleR "recycles" existing data by simply applying rule-based edits to their responses and appending the rule-instructions in their original instructions. Experimental results demonstrate RuleR's effectiveness in improving LLM controllability while maintaining general instruction-following capabilities.
Related papers
- MoRE-LLM: Mixture of Rule Experts Guided by a Large Language Model [54.14155564592936]
We propose a Mixture of Rule Experts guided by a Large Language Model (MoRE-LLM)
MoRE-LLM steers the discovery of local rule-based surrogates during training and their utilization for the classification task.
LLM is responsible for enhancing the domain knowledge alignment of the rules by correcting and contextualizing them.
arXiv Detail & Related papers (2025-03-26T11:09:21Z) - LLM-Lasso: A Robust Framework for Domain-Informed Feature Selection and Regularization [59.75242204923353]
We introduce LLM-Lasso, a framework that leverages large language models (LLMs) to guide feature selection in Lasso regression.
LLMs generate penalty factors for each feature, which are converted into weights for the Lasso penalty using a simple, tunable model.
Features identified as more relevant by the LLM receive lower penalties, increasing their likelihood of being retained in the final model.
arXiv Detail & Related papers (2025-02-15T02:55:22Z) - Invar-RAG: Invariant LLM-aligned Retrieval for Better Generation [43.630437906898635]
We propose a novel two-stage fine-tuning architecture called Invar-RAG.
In the retrieval stage, an LLM-based retriever is constructed by integrating LoRA-based representation learning.
In the generation stage, a refined fine-tuning method is employed to improve LLM accuracy in generating answers based on retrieved information.
arXiv Detail & Related papers (2024-11-11T14:25:37Z) - Constraint Back-translation Improves Complex Instruction Following of Large Language Models [55.60192044049083]
Large language models (LLMs) struggle to follow instructions with complex constraints in format, length, etc.
Previous works conduct post-training on complex instruction-response pairs generated by feeding complex instructions to advanced LLMs.
We propose a novel data generation technique, constraint back-translation.
arXiv Detail & Related papers (2024-10-31T17:42:26Z) - LLM Unlearning via Loss Adjustment with Only Forget Data [20.310423152885217]
We introduce Forget data only Loss AjustmenT (FLAT), a "flat" loss adjustment approach which addresses these issues.
Empirical results demonstrate that our approach achieves superior unlearning performance compared to existing methods.
arXiv Detail & Related papers (2024-10-14T23:43:33Z) - LLM Self-Correction with DeCRIM: Decompose, Critique, and Refine for Enhanced Following of Instructions with Multiple Constraints [86.59857711385833]
We introduce RealInstruct, the first benchmark designed to evaluate LLMs' ability to follow real-world multi-constrained instructions.
To address the performance gap between open-source and proprietary models, we propose the Decompose, Critique and Refine (DeCRIM) self-correction pipeline.
Our results show that DeCRIM improves Mistral's performance by 7.3% on RealInstruct and 8.0% on IFEval even with weak feedback.
arXiv Detail & Related papers (2024-10-09T01:25:10Z) - RNR: Teaching Large Language Models to Follow Roles and Rules [153.6596303205894]
We propose model, an automated data generation pipeline that generates diverse roles and rules from existing IFT instructions.
This data can then be used to train models that follow complex system prompts.
Our framework significantly improves role and rule following capability in large language models.
arXiv Detail & Related papers (2024-09-10T06:07:32Z) - Open-domain Implicit Format Control for Large Language Model Generation [52.83173553689678]
We introduce a novel framework for controlled generation in large language models (LLMs)
This study investigates LLMs' capabilities to follow open-domain, one-shot constraints and replicate the format of the example answers.
We also develop a dataset collection methodology for supervised fine-tuning that enhances the open-domain format control of LLMs without degrading output quality.
arXiv Detail & Related papers (2024-08-08T11:51:45Z) - DELRec: Distilling Sequential Pattern to Enhance LLM-based Recommendation [3.5113201254928117]
Sequential recommendation (SR) tasks enhance recommendation accuracy by capturing the connection between users' past interactions and their changing preferences.
Conventional models often focus solely on capturing sequential patterns within the training data, neglecting the broader context and semantic information embedded in item titles from external sources.
DelRec aims to extract knowledge from SR models and enable LLMs to easily comprehend and utilize this supplementary information for more effective sequential recommendations.
arXiv Detail & Related papers (2024-06-17T02:47:09Z) - One Token Can Help! Learning Scalable and Pluggable Virtual Tokens for Retrieval-Augmented Large Language Models [67.49462724595445]
Retrieval-augmented generation (RAG) is a promising way to improve large language models (LLMs)
We propose a novel method that involves learning scalable and pluggable virtual tokens for RAG.
arXiv Detail & Related papers (2024-05-30T03:44:54Z) - Curated LLM: Synergy of LLMs and Data Curation for tabular augmentation in low-data regimes [57.62036621319563]
We introduce CLLM, which leverages the prior knowledge of Large Language Models (LLMs) for data augmentation in the low-data regime.
We demonstrate the superior performance of CLLM in the low-data regime compared to conventional generators.
arXiv Detail & Related papers (2023-12-19T12:34:46Z) - Failures Pave the Way: Enhancing Large Language Models through
Tuning-free Rule Accumulation [11.366334433990588]
Large Language Models (LLMs) have showcased impressive performance.
Due to their inability to capture relationships among samples, these frozen LLMs inevitably keep repeating similar mistakes.
We propose our Tuning-free Rule Accumulation (TRAN) framework, which guides LLMs in improving their performance by learning from previous mistakes.
arXiv Detail & Related papers (2023-10-24T11:40:34Z) - Reflection-Tuning: Data Recycling Improves LLM Instruction-Tuning [79.32236399694077]
Low-quality data in the training set are usually detrimental to instruction tuning.
We propose a novel method, termed "reflection-tuning"
This approach utilizes an oracle LLM to recycle the original training data by introspecting and enhancing the quality of instructions and responses in the data.
arXiv Detail & Related papers (2023-10-18T05:13:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.