Pay More Attention to the Robustness of Prompt for Instruction Data Mining
- URL: http://arxiv.org/abs/2503.24028v1
- Date: Mon, 31 Mar 2025 12:53:08 GMT
- Title: Pay More Attention to the Robustness of Prompt for Instruction Data Mining
- Authors: Qiang Wang, Dawei Feng, Xu Zhang, Ao Shen, Yang Xu, Bo Ding, Huaimin Wang,
- Abstract summary: This paper proposes a pioneering framework of high-quality online instruction data mining for instruction tuning.<n>Our notable innovation, is to generate the adversarial instruction data by conducting the attack for the prompt of online instruction data.<n>We conduct extensive experiments on two benchmark datasets to assess the performance.
- Score: 15.350709684929116
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Instruction tuning has emerged as a paramount method for tailoring the behaviors of LLMs. Recent work has unveiled the potential for LLMs to achieve high performance through fine-tuning with a limited quantity of high-quality instruction data. Building upon this approach, we further explore the impact of prompt's robustness on the selection of high-quality instruction data. This paper proposes a pioneering framework of high-quality online instruction data mining for instruction tuning, focusing on the impact of prompt's robustness on the data mining process. Our notable innovation, is to generate the adversarial instruction data by conducting the attack for the prompt of online instruction data. Then, we introduce an Adversarial Instruction-Following Difficulty metric to measure how much help the adversarial instruction data can provide to the generation of the corresponding response. Apart from it, we propose a novel Adversarial Instruction Output Embedding Consistency approach to select high-quality online instruction data. We conduct extensive experiments on two benchmark datasets to assess the performance. The experimental results serve to underscore the effectiveness of our proposed two methods. Moreover, the results underscore the critical practical significance of considering prompt's robustness.
Related papers
- Robust Offline Imitation Learning Through State-level Trajectory Stitching [37.281554320048755]
Imitation learning (IL) has proven effective for enabling robots to acquire visuomotor skills through expert demonstrations.<n>Recent advances in offline IL have incorporated suboptimal, unlabeled datasets into the training.<n>We propose a novel approach to enhance policy learning from mixed-quality offline datasets by leveraging task-relevant trajectory fragments and rich environmental dynamics.
arXiv Detail & Related papers (2025-03-28T15:28:36Z) - Aligning Large Language Models to Follow Instructions and Hallucinate Less via Effective Data Filtering [66.5524727179286]
NOVA is a framework designed to identify high-quality data that aligns well with the learned knowledge to reduce hallucinations.
It includes Internal Consistency Probing (ICP) and Semantic Equivalence Identification (SEI) to measure how familiar the LLM is with instruction data.
To ensure the quality of selected samples, we introduce an expert-aligned reward model, considering characteristics beyond just familiarity.
arXiv Detail & Related papers (2025-02-11T08:05:56Z) - Eliciting Causal Abilities in Large Language Models for Reasoning Tasks [14.512834333917414]
We introduce the Self-Causal Instruction Enhancement (SCIE) method, which enables LLMs to generate high-quality, low-quantity observational data.<n>In SCIE, the instructions are treated as the treatment, and textual features are used to process natural language.<n>Our method effectively generates instructions that enhance reasoning performance with reduced training cost of prompts.
arXiv Detail & Related papers (2024-12-19T17:03:02Z) - A Systematic Examination of Preference Learning through the Lens of Instruction-Following [83.71180850955679]
We use a novel synthetic data generation pipeline to generate 48,000 instruction unique-following prompts.
With our synthetic prompts, we use two preference dataset curation methods - rejection sampling (RS) and Monte Carlo Tree Search (MCTS)
Experiments reveal that shared prefixes in preference pairs, as generated by MCTS, provide marginal but consistent improvements.
High-contrast preference pairs generally outperform low-contrast pairs; however, combining both often yields the best performance.
arXiv Detail & Related papers (2024-12-18T15:38:39Z) - Less is More: High-value Data Selection for Visual Instruction Tuning [127.38740043393527]
We propose a high-value data selection approach TIVE, to eliminate redundancy within the visual instruction data and reduce the training cost.
Our approach using only about 15% data can achieve comparable average performance to the full-data fine-tuned model across eight benchmarks.
arXiv Detail & Related papers (2024-03-14T16:47:25Z) - A Survey on Data Selection for LLM Instruction Tuning [18.94987580516951]
We propose a new taxonomy of the data selection methods and provide a detailed introduction of recent advances.
We emphasize the open challenges and present new frontiers of this task.
arXiv Detail & Related papers (2024-02-04T13:32:01Z) - One-Shot Learning as Instruction Data Prospector for Large Language Models [108.81681547472138]
textscNuggets uses one-shot learning to select high-quality instruction data from extensive datasets.
We show that instruction tuning with the top 1% of examples curated by textscNuggets substantially outperforms conventional methods employing the entire dataset.
arXiv Detail & Related papers (2023-12-16T03:33:12Z) - Query-Dependent Prompt Evaluation and Optimization with Offline Inverse
RL [62.824464372594576]
We aim to enhance arithmetic reasoning ability of Large Language Models (LLMs) through zero-shot prompt optimization.
We identify a previously overlooked objective of query dependency in such optimization.
We introduce Prompt-OIRL, which harnesses offline inverse reinforcement learning to draw insights from offline prompting demonstration data.
arXiv Detail & Related papers (2023-09-13T01:12:52Z) - ALLSH: Active Learning Guided by Local Sensitivity and Hardness [98.61023158378407]
We propose to retrieve unlabeled samples with a local sensitivity and hardness-aware acquisition function.
Our method achieves consistent gains over the commonly used active learning strategies in various classification tasks.
arXiv Detail & Related papers (2022-05-10T15:39:11Z) - Compressive Features in Offline Reinforcement Learning for Recommender
Systems [2.3513645401551333]
We develop a recommender system for a game that suggests potential items to players based on their interactive behaviors to maximize revenue for the game provider.
Our approach is built on a reinforcement learning-based technique and is trained on an offline data set that is publicly available on an IEEE Big Data Cup challenge.
arXiv Detail & Related papers (2021-11-16T22:43:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.