RAISE: Reinforenced Adaptive Instruction Selection For Large Language Models
- URL: http://arxiv.org/abs/2504.07282v2
- Date: Mon, 14 Apr 2025 16:23:29 GMT
- Title: RAISE: Reinforenced Adaptive Instruction Selection For Large Language Models
- Authors: Lv Qingsong, Yangning Li, Zihua Lan, Zishan Xu, Jiwei Tang, Yinghui Li, Wenhao Jiang, Hai-Tao Zheng, Philip S. Yu,
- Abstract summary: We propose a task-objective-driven instruction selection framework RAISE.<n> RAISE incorporates the entire instruction fine-tuning process into optimization.<n>It selects instruction at each step based on the expected impact of instruction on model performance improvement.
- Score: 48.63476198469349
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the instruction fine-tuning of large language models (LLMs), it has become a consensus that a few high-quality instructions are superior to a large number of low-quality instructions. At present, many instruction selection methods have been proposed, but most of these methods select instruction based on heuristic quality metrics, and only consider data selection before training. These designs lead to insufficient optimization of instruction fine-tuning, and fixed heuristic indicators are often difficult to optimize for specific tasks. So we designed a dynamic, task-objective-driven instruction selection framework RAISE(Reinforenced Adaptive Instruction SElection), which incorporates the entire instruction fine-tuning process into optimization, selecting instruction at each step based on the expected impact of instruction on model performance improvement. Our approach is well interpretable and has strong task-specific optimization capabilities. By modeling dynamic instruction selection as a sequential decision-making process, we use RL to train our selection strategy. Extensive experiments and result analysis prove the superiority of our method compared with other instruction selection methods. Notably, RAISE achieves superior performance by updating only 1\% of the training steps compared to full-data training, demonstrating its efficiency and effectiveness.
Related papers
- IterSelectTune: An Iterative Training Framework for Efficient Instruction-Tuning Data Selection [28.581257601441045]
We introduce $textbfIterSelectTune$, an efficient, cost-effective iterative training policy for selecting high-quality instruction data.
By fine-tuning on approximately 20% of the source data, our method consistently outperforms models fine-tuned on the full dataset.
arXiv Detail & Related papers (2024-10-17T11:48:57Z) - Beyond IID: Optimizing Instruction Learning from the Perspective of Instruction Interaction and Dependency [12.145516262749643]
We investigate interaction and dependency patterns between different categories of instructions to fine-tune large language models (LLMs)
Experimental results across different LLMs demonstrate improved performance over strong baselines on widely adopted benchmarks.
arXiv Detail & Related papers (2024-09-11T06:27:50Z) - Large Language Models Prompting With Episodic Memory [53.8690170372303]
We propose PrOmpting with Episodic Memory (POEM), a novel prompt optimization technique that is simple, efficient, and demonstrates strong generalization capabilities.
In the testing phase, we optimize the sequence of examples for each test query by selecting the sequence that yields the highest total rewards from the top-k most similar training examples in the episodic memory.
Our results show that POEM outperforms recent techniques like TEMPERA and RLPrompt by over 5.3% in various text classification tasks.
arXiv Detail & Related papers (2024-08-14T11:19:28Z) - Automatic Instruction Evolving for Large Language Models [93.52437926313621]
Auto Evol-Instruct is an end-to-end framework that evolves instruction datasets using large language models without any human effort.
Our experiments demonstrate that the best method optimized by Auto Evol-Instruct outperforms human-designed methods on various benchmarks.
arXiv Detail & Related papers (2024-06-02T15:09:00Z) - Mosaic-IT: Free Compositional Data Augmentation Improves Instruction Tuning [30.82220015525281]
Mosaic Instruction Tuning (Mosaic-IT) is a human/model-free compositional data augmentation method.
Mosaic-IT randomly creates rich and diverse augmentations from existing instruction tuning data.
Our evaluations demonstrate a superior performance and training efficiency of Mosaic-IT.
arXiv Detail & Related papers (2024-05-22T04:08:20Z) - Instruction Matters: A Simple yet Effective Task Selection for Optimized Instruction Tuning of Specific Tasks [51.15473776489712]
We introduce a simple yet effective task selection method that leverages instruction information alone to identify relevant tasks.
Our method is significantly more efficient than traditional approaches, which require complex measurements of pairwise transferability between tasks or the creation of data samples for the target task.
Experimental results demonstrate that training on a small set of tasks, chosen solely on the instructions, results in substantial improvements in performance on benchmarks such as P3, Big-Bench, NIV2, and Big-Bench Hard.
arXiv Detail & Related papers (2024-04-25T08:49:47Z) - One-Shot Learning as Instruction Data Prospector for Large Language Models [108.81681547472138]
textscNuggets uses one-shot learning to select high-quality instruction data from extensive datasets.
We show that instruction tuning with the top 1% of examples curated by textscNuggets substantially outperforms conventional methods employing the entire dataset.
arXiv Detail & Related papers (2023-12-16T03:33:12Z) - InstOptima: Evolutionary Multi-objective Instruction Optimization via
Large Language Model-based Instruction Operators [9.004528034920266]
InstOptima treats instruction generation as an evolutionary multi-objective optimization problem.
We introduce an objective-guided mechanism for operators, allowing the LLM to comprehend the objectives and enhance the quality of the generated instructions.
Experimental results demonstrate improved fine-tuning performance and the generation of a diverse set of high-quality instructions.
arXiv Detail & Related papers (2023-10-26T17:48:45Z) - Robust Prompt Optimization for Large Language Models Against
Distribution Shifts [80.6757997074956]
Large Language Model (LLM) has demonstrated significant ability in various Natural Language Processing tasks.
We propose a new problem of robust prompt optimization for LLMs against distribution shifts.
This problem requires the prompt optimized over the labeled source group can simultaneously generalize to an unlabeled target group.
arXiv Detail & Related papers (2023-05-23T11:30:43Z) - Bayesian Optimization for Selecting Efficient Machine Learning Models [53.202224677485525]
We present a unified Bayesian Optimization framework for jointly optimizing models for both prediction effectiveness and training efficiency.
Experiments on model selection for recommendation tasks indicate models selected this way significantly improves model training efficiency.
arXiv Detail & Related papers (2020-08-02T02:56:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.