Optimizing and Testing Instruction-Following: Analyzing the Impact of Fine-Grained Instruction Variants on instruction-tuned LLMs
- URL: http://arxiv.org/abs/2406.11301v1
- Date: Mon, 17 Jun 2024 08:08:11 GMT
- Title: Optimizing and Testing Instruction-Following: Analyzing the Impact of Fine-Grained Instruction Variants on instruction-tuned LLMs
- Authors: Jiuding Yang, Weidong Guo, Kaitong Yang, Xiangyang Li, Zhuwei Rao, Yu Xu, Di Niu,
- Abstract summary: We introduce an effective data augmentation technique that decomposes complex instructions into simpler sub-components, modifies these, and reconstructs them into new variants.
Our findings show that LLMs fine-tuned with DeMoRecon will gain significant performance boost on both ours and commonly used instructions-following benchmarks.
- Score: 27.321629102942754
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The effective alignment of Large Language Models (LLMs) with precise instructions is essential for their application in diverse real-world scenarios. Current methods focus on enhancing the diversity and complexity of training and evaluation samples, yet they fall short in accurately assessing LLMs' ability to follow similar instruction variants. We introduce an effective data augmentation technique that decomposes complex instructions into simpler sub-components, modifies these, and reconstructs them into new variants, thereby preserves the original instruction's context and complexity while introducing variability, which is critical for training and evaluating LLMs' instruction-following precision. We developed the DeMoRecon dataset using this method to both fine-tune and evaluate LLMs. Our findings show that LLMs fine-tuned with DeMoRecon will gain significant performance boost on both ours and commonly used instructions-following benchmarks.
Related papers
- MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs [47.94710556156627]
MIA-Bench is a benchmark designed to evaluate multimodal large language models (MLLMs) on their ability to strictly adhere to complex instructions.
Our benchmark comprises a diverse set of 400 image-prompt pairs, each crafted to challenge the models' compliance with layered instructions.
arXiv Detail & Related papers (2024-07-01T17:53:35Z) - A Better LLM Evaluator for Text Generation: The Impact of Prompt Output Sequencing and Optimization [17.38671584773247]
This research investigates prompt designs of evaluating generated texts using large language models (LLMs)
We found that the order of presenting reasons and scores significantly influences LLMs' scoring.
An additional optimization may enhance scoring alignment if sufficient data is available.
arXiv Detail & Related papers (2024-06-14T12:31:44Z) - One Token Can Help! Learning Scalable and Pluggable Virtual Tokens for Retrieval-Augmented Large Language Models [67.49462724595445]
Retrieval-augmented generation (RAG) is a promising way to improve large language models (LLMs)
We propose a novel method that involves learning scalable and pluggable virtual tokens for RAG.
arXiv Detail & Related papers (2024-05-30T03:44:54Z) - Distilling Instruction-following Abilities of Large Language Models with Task-aware Curriculum Planning [12.651588927599441]
We introduce Task-Aware Curriculum Planning for Instruction Refinement (TAPIR)
TAPIR is a multi-round distillation framework with balanced task distributions and dynamic difficulty adjustment.
We rigorously evaluate TAPIR using two widely recognized benchmarks, including AlpacaEval 2.0 and MT-Bench.
arXiv Detail & Related papers (2024-05-22T08:38:26Z) - Mosaic IT: Enhancing Instruction Tuning with Data Mosaics [30.82220015525281]
We introduce Mosaic Instruction Tuning (Mosaic-IT), a human/model-free method for finetuning large language models.
Mosaic-IT randomly achieves multiple instruction data into one and trains the model to produce the corresponding responses.
Our evaluations demonstrate a superior performance and training efficiency of Mosaic-IT.
arXiv Detail & Related papers (2024-05-22T04:08:20Z) - What Makes for Good Visual Instructions? Synthesizing Complex Visual
Reasoning Instructions for Visual Instruction Tuning [115.19451843294154]
Visual instruction tuning is an essential approach to improving the zero-shot generalization capability of Multi-modal Large Language Models (MLLMs)
We propose a systematic approach to automatically creating high-quality complex visual reasoning instructions.
Our dataset consistently enhances the performance of all the compared MLLMs, e.g., improving the performance of MiniGPT-4 and BLIP-2 on MME-Cognition by 32.6% and 28.8%, respectively.
arXiv Detail & Related papers (2023-11-02T15:36:12Z) - Reflection-Tuning: Data Recycling Improves LLM Instruction-Tuning [79.32236399694077]
Low-quality data in the training set are usually detrimental to instruction tuning.
We propose a novel method, termed "reflection-tuning"
This approach utilizes an oracle LLM to recycle the original training data by introspecting and enhancing the quality of instructions and responses in the data.
arXiv Detail & Related papers (2023-10-18T05:13:47Z) - Evaluating Large Language Models at Evaluating Instruction Following [54.49567482594617]
We introduce a challenging meta-evaluation benchmark, LLMBar, designed to test the ability of an LLM evaluator in discerning instruction-following outputs.
We discover that different evaluators exhibit distinct performance on LLMBar and even the highest-scoring ones have substantial room for improvement.
arXiv Detail & Related papers (2023-10-11T16:38:11Z) - From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning [52.257422715393574]
We introduce a self-guided methodology for Large Language Models (LLMs) to autonomously discern and select cherry samples from open-source datasets.
Our key innovation, the Instruction-Following Difficulty (IFD) metric, emerges as a pivotal metric to identify discrepancies between a model's expected responses and its intrinsic generation capability.
arXiv Detail & Related papers (2023-08-23T09:45:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.