Related papers: Instruction Tuning With Loss Over Instructions

Instruction Tuning With Loss Over Instructions

URL: http://arxiv.org/abs/2405.14394v2
Date: Wed, 02 Oct 2024 20:36:11 GMT
Title: Instruction Tuning With Loss Over Instructions
Authors: Zhengyan Shi, Adam X. Yang, Bin Wu, Laurence Aitchison, Emine Yilmaz, Aldo Lipani,
Abstract summary: Instruction Modelling (IM) trains LMs by applying a loss function to the instruction and prompt part rather than solely to the output part. We show that, in many scenarios, IM can effectively improve the LM performance on both NLP tasks and open-ended generation benchmarks. Remarkably, in the most advantageous case, IM boosts model performance on AlpacaEval 1.0 by over 100%.
Score: 42.9106826952674
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Instruction tuning plays a crucial role in shaping the outputs of language models (LMs) to desired styles. In this work, we propose a simple yet effective method, Instruction Modelling (IM), which trains LMs by applying a loss function to the instruction and prompt part rather than solely to the output part. Through experiments across 21 diverse benchmarks, we show that, in many scenarios, IM can effectively improve the LM performance on both NLP tasks (e.g., MMLU, TruthfulQA, and HumanEval) and open-ended generation benchmarks (e.g., MT-Bench and AlpacaEval). Remarkably, in the most advantageous case, IM boosts model performance on AlpacaEval 1.0 by over 100%. We identify two key factors influencing the effectiveness of IM: (1) The ratio between instruction length and output length in the training data; and (2) The number of training examples. We observe that IM is especially beneficial when trained on datasets with lengthy instructions paired with brief outputs, or under the Superficial Alignment Hypothesis (SAH) where a small amount of training examples are used for instruction tuning. Further analysis substantiates our hypothesis that our improvement can be attributed to reduced overfitting to instruction tuning datasets. It is worth noting that we are not proposing \ours as a replacement for current fine-tuning processes. Instead, our work aims to provide practical guidance for instruction tuning LMs, especially in low-resource scenarios.

Related papers

InfoSteer: Steering Information Utility in Language Model Post-Training [7.756342860929851]
We present a lightweight method that encourages parametric information utilization in language models (LMs) during post-training.<n>We find this simple guidance delivers consistent performance improvements across diverse model families--including Qwen, Gemma and Llama.<n>Our work underscores that vanilla post-training does not fully leverage pre-training potential, and steering LMs in latent representation space offers a promising approach.
arXiv Detail & Related papers (2025-07-07T16:13:21Z)
Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate [118.37653302885607]
We present the Modality Integration Rate (MIR), an effective, robust, and generalized metric to indicate the multi-modal pre-training quality of Large Vision Language Models (LVLMs) MIR is indicative about training data selection, training strategy schedule, and model architecture design to get better pre-training results.
arXiv Detail & Related papers (2024-10-09T17:59:04Z)
DELIA: Diversity-Enhanced Learning for Instruction Adaptation in Large Language Models [11.77848664657788]
We show that instruction tuning is primarily a process where the model fits to specific task formats, rather than acquiring new knowledge or capabilities. We propose that this limitation stems from biased features learned during instruction tuning, which differ from ideal task-specfic features. We use our novel data synthesis method, DELIA, to transform biased features in instruction tuning into approximations of ideal features.
arXiv Detail & Related papers (2024-08-19T17:56:06Z)
Dancing in Chains: Reconciling Instruction Following and Faithfulness in Language Models [34.13519934563742]
We show that modern language models (LMs) need to follow human instructions while being faithful. We propose a simple yet effective method that relies on Rejection Sampling for Continued Self-instruction Tuning (ReSet) We find that less is more, as training ReSet with high-quality, yet substantially smaller data (three-fold less) yields superior results.
arXiv Detail & Related papers (2024-07-31T08:05:04Z)
SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts. We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM. We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z)
Enhancing and Assessing Instruction-Following with Fine-Grained Instruction Variants [28.691691883519542]
We introduce a technique that decomposes complex instructions into simpler sub-components, modifies these, and reconstructs them into new variants. Based on DeMoRecon, we developed the FGIV dataset which contains fine-grained instruction variants of 1,773 seed instructions. Our findings show that LLMs fine-tuned with FGIV will gain significant performance boost on both ours and commonly used instructions-following benchmarks.
arXiv Detail & Related papers (2024-06-17T08:08:11Z)
Aligning Language Models with Demonstrated Feedback [58.834937450242975]
Demonstration ITerated Task Optimization (DITTO) directly aligns language model outputs to a user's demonstrated behaviors. We evaluate DITTO's ability to learn fine-grained style and task alignment across domains such as news articles, emails, and blog posts.
arXiv Detail & Related papers (2024-06-02T23:13:56Z)
Don't Half-listen: Capturing Key-part Information in Continual Instruction Tuning [13.535110749767451]
We propose a novel continual instruction tuning method based on Key-part Information Gain (KPIG) Our method computes the information gain on masked parts to dynamically replay data and refine the training objective. Experiments demonstrate our method achieves superior performance on both seen and held-out tasks.
arXiv Detail & Related papers (2024-03-15T06:54:20Z)
Reflection-Tuning: Data Recycling Improves LLM Instruction-Tuning [79.32236399694077]
Low-quality data in the training set are usually detrimental to instruction tuning. We propose a novel method, termed "reflection-tuning" This approach utilizes an oracle LLM to recycle the original training data by introspecting and enhancing the quality of instructions and responses in the data.
arXiv Detail & Related papers (2023-10-18T05:13:47Z)
Instruction Position Matters in Sequence Generation with Large Language Models [67.87516654892343]
Large language models (LLMs) are capable of performing conditional sequence generation tasks, such as translation or summarization. We propose enhancing the instruction-following capability of LLMs by shifting the position of task instructions after the input sentences.
arXiv Detail & Related papers (2023-08-23T12:36:57Z)
Preference-grounded Token-level Guidance for Language Model Fine-tuning [105.88789610320426]
Aligning language models with preferences is an important problem in natural language generation. For LM training, based on the amount of supervised data, we present two *minimalist* learning objectives that utilize the learned guidance. In experiments, our method performs competitively on two distinct representative LM tasks.
arXiv Detail & Related papers (2023-06-01T07:00:07Z)
LeTI: Learning to Generate from Textual Interactions [60.425769582343506]
We explore LMs' potential to learn from textual interactions (LETI) that not only check their correctness with binary labels but also pinpoint and explain errors in their outputs through textual feedback. Our focus is the code generation task, where the model produces code based on natural language instructions. LETI iteratively fine-tunes the model, using the objective LM, on a concatenation of natural language instructions, LM-generated programs, and textual feedback.
arXiv Detail & Related papers (2023-05-17T15:53:31Z)
Boosting Natural Language Generation from Instructions with Meta-Learning [43.64522457686405]
Recent work has shown that language models (LMs) trained with multi-task. textitinstructional learning (MTIL) can solve diverse NLP. tasks with improved performance compared to prompt tuning. In this paper we investigate whether meta-learning applied to MTIL can further improve generalization to unseen tasks in a zero-shot setting.
arXiv Detail & Related papers (2022-10-20T22:23:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.