Related papers: RAISE: Reinforenced Adaptive Instruction Selection For Large Language Models

RAISE: Reinforenced Adaptive Instruction Selection For Large Language Models

URL: http://arxiv.org/abs/2504.07282v2
Date: Mon, 14 Apr 2025 16:23:29 GMT
Title: RAISE: Reinforenced Adaptive Instruction Selection For Large Language Models
Authors: Lv Qingsong, Yangning Li, Zihua Lan, Zishan Xu, Jiwei Tang, Yinghui Li, Wenhao Jiang, Hai-Tao Zheng, Philip S. Yu,
Abstract summary: We propose a task-objective-driven instruction selection framework RAISE.<n> RAISE incorporates the entire instruction fine-tuning process into optimization.<n>It selects instruction at each step based on the expected impact of instruction on model performance improvement.
Score: 48.63476198469349
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the instruction fine-tuning of large language models (LLMs), it has become a consensus that a few high-quality instructions are superior to a large number of low-quality instructions. At present, many instruction selection methods have been proposed, but most of these methods select instruction based on heuristic quality metrics, and only consider data selection before training. These designs lead to insufficient optimization of instruction fine-tuning, and fixed heuristic indicators are often difficult to optimize for specific tasks. So we designed a dynamic, task-objective-driven instruction selection framework RAISE(Reinforenced Adaptive Instruction SElection), which incorporates the entire instruction fine-tuning process into optimization, selecting instruction at each step based on the expected impact of instruction on model performance improvement. Our approach is well interpretable and has strong task-specific optimization capabilities. By modeling dynamic instruction selection as a sequential decision-making process, we use RL to train our selection strategy. Extensive experiments and result analysis prove the superiority of our method compared with other instruction selection methods. Notably, RAISE achieves superior performance by updating only 1\% of the training steps compared to full-data training, demonstrating its efficiency and effectiveness.

Related papers

Teaching According to Talents! Instruction Tuning LLMs with Competence-Aware Curriculum Learning [64.92967672226534]
This paper presents a Competence-Aware Multi-Perspective cUrriculum inStruction tuning framework termed CAMPUS.<n> CAMPUS offers several advantages: Dynamic selection for sub-curriculum, competency-aware adjustment to the curriculum schedule, and multiple difficulty-based scheduling.
arXiv Detail & Related papers (2025-09-17T07:58:59Z)
Bridging Offline and Online Reinforcement Learning for LLMs [71.48552761763158]
We investigate the effectiveness of reinforcement learning methods for finetuning large language models when transitioning from offline to semi-online.<n>Our experiments cover training on verifiable math as well as non-verifiable instruction following with a set of benchmark evaluations for both.
arXiv Detail & Related papers (2025-06-26T17:25:49Z)
Adaptive Batch-Wise Sample Scheduling for Direct Preference Optimization [37.54165341391688]
We introduce a novel problem: Sample Scheduling for DPO.<n>We propose SamS, an efficient and effective algorithm that adaptively selects samples in each training batch.<n>This work points to a promising new direction for improving LLM alignment through batch-wise sample selection.
arXiv Detail & Related papers (2025-06-08T10:26:09Z)
Large Language Models are Demonstration Pre-Selectors for Themselves [57.101804269100185]
In-context learning (ICL) with large language models (LLMs) delivers strong few-shot performance by choosing few-shot demonstrations from the entire training data.<n>FEw yet Essential Demonstration prE-selectoR is a novel pre-selection framework that identifies a representative subset of demonstrations.<n>FEw yet Essential Demonstration prE-selectoR can reduce training data size by over 20% while maintaining performance.
arXiv Detail & Related papers (2025-06-06T12:29:03Z)
IterSelectTune: An Iterative Training Framework for Efficient Instruction-Tuning Data Selection [28.581257601441045]
We introduce $textbfIterSelectTune$, an efficient, cost-effective iterative training policy for selecting high-quality instruction data. By fine-tuning on approximately 20% of the source data, our method consistently outperforms models fine-tuned on the full dataset.
arXiv Detail & Related papers (2024-10-17T11:48:57Z)
Beyond IID: Optimizing Instruction Learning from the Perspective of Instruction Interaction and Dependency [12.145516262749643]
We investigate interaction and dependency patterns between different categories of instructions to fine-tune large language models (LLMs) Experimental results across different LLMs demonstrate improved performance over strong baselines on widely adopted benchmarks.
arXiv Detail & Related papers (2024-09-11T06:27:50Z)
Large Language Models Prompting With Episodic Memory [53.8690170372303]
We propose PrOmpting with Episodic Memory (POEM), a novel prompt optimization technique that is simple, efficient, and demonstrates strong generalization capabilities. In the testing phase, we optimize the sequence of examples for each test query by selecting the sequence that yields the highest total rewards from the top-k most similar training examples in the episodic memory. Our results show that POEM outperforms recent techniques like TEMPERA and RLPrompt by over 5.3% in various text classification tasks.
arXiv Detail & Related papers (2024-08-14T11:19:28Z)
Automatic Instruction Evolving for Large Language Models [93.52437926313621]
Auto Evol-Instruct is an end-to-end framework that evolves instruction datasets using large language models without any human effort. Our experiments demonstrate that the best method optimized by Auto Evol-Instruct outperforms human-designed methods on various benchmarks.
arXiv Detail & Related papers (2024-06-02T15:09:00Z)
Mosaic-IT: Free Compositional Data Augmentation Improves Instruction Tuning [30.82220015525281]
Mosaic Instruction Tuning (Mosaic-IT) is a human/model-free compositional data augmentation method. Mosaic-IT randomly creates rich and diverse augmentations from existing instruction tuning data. Our evaluations demonstrate a superior performance and training efficiency of Mosaic-IT.
arXiv Detail & Related papers (2024-05-22T04:08:20Z)
Instruction Matters: A Simple yet Effective Task Selection for Optimized Instruction Tuning of Specific Tasks [51.15473776489712]
We introduce a simple yet effective task selection method that leverages instruction information alone to identify relevant tasks. Our method is significantly more efficient than traditional approaches, which require complex measurements of pairwise transferability between tasks or the creation of data samples for the target task. Experimental results demonstrate that training on a small set of tasks, chosen solely on the instructions, results in substantial improvements in performance on benchmarks such as P3, Big-Bench, NIV2, and Big-Bench Hard.
arXiv Detail & Related papers (2024-04-25T08:49:47Z)
One-Shot Learning as Instruction Data Prospector for Large Language Models [108.81681547472138]
textscNuggets uses one-shot learning to select high-quality instruction data from extensive datasets. We show that instruction tuning with the top 1% of examples curated by textscNuggets substantially outperforms conventional methods employing the entire dataset.
arXiv Detail & Related papers (2023-12-16T03:33:12Z)
InstOptima: Evolutionary Multi-objective Instruction Optimization via Large Language Model-based Instruction Operators [9.004528034920266]
InstOptima treats instruction generation as an evolutionary multi-objective optimization problem. We introduce an objective-guided mechanism for operators, allowing the LLM to comprehend the objectives and enhance the quality of the generated instructions. Experimental results demonstrate improved fine-tuning performance and the generation of a diverse set of high-quality instructions.
arXiv Detail & Related papers (2023-10-26T17:48:45Z)
Robust Prompt Optimization for Large Language Models Against Distribution Shifts [80.6757997074956]
Large Language Model (LLM) has demonstrated significant ability in various Natural Language Processing tasks. We propose a new problem of robust prompt optimization for LLMs against distribution shifts. This problem requires the prompt optimized over the labeled source group can simultaneously generalize to an unlabeled target group.
arXiv Detail & Related papers (2023-05-23T11:30:43Z)
Large Language Models Are Human-Level Prompt Engineers [31.98042013940282]
We propose Automatic Prompt Engineer for automatic instruction generation and selection. We show that APE-engineered prompts can be applied to steer models toward truthfulness and/or informativeness.
arXiv Detail & Related papers (2022-11-03T15:43:03Z)
Bayesian Optimization for Selecting Efficient Machine Learning Models [53.202224677485525]
We present a unified Bayesian Optimization framework for jointly optimizing models for both prediction effectiveness and training efficiency. Experiments on model selection for recommendation tasks indicate models selected this way significantly improves model training efficiency.
arXiv Detail & Related papers (2020-08-02T02:56:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.