Instruction Position Matters in Sequence Generation with Large Language
Models
- URL: http://arxiv.org/abs/2308.12097v1
- Date: Wed, 23 Aug 2023 12:36:57 GMT
- Title: Instruction Position Matters in Sequence Generation with Large Language
Models
- Authors: Yijin Liu, Xianfeng Zeng, Fandong Meng, Jie Zhou
- Abstract summary: Large language models (LLMs) are capable of performing conditional sequence generation tasks, such as translation or summarization.
We propose enhancing the instruction-following capability of LLMs by shifting the position of task instructions after the input sentences.
- Score: 67.87516654892343
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) are capable of performing conditional sequence
generation tasks, such as translation or summarization, through instruction
fine-tuning. The fine-tuning data is generally sequentially concatenated from a
specific task instruction, an input sentence, and the corresponding response.
Considering the locality modeled by the self-attention mechanism of LLMs, these
models face the risk of instruction forgetting when generating responses for
long input sentences. To mitigate this issue, we propose enhancing the
instruction-following capability of LLMs by shifting the position of task
instructions after the input sentences. Theoretical analysis suggests that our
straightforward method can alter the model's learning focus, thereby
emphasizing the training of instruction-following capabilities. Concurrently,
experimental results demonstrate that our approach consistently outperforms
traditional settings across various model scales (1B / 7B / 13B) and different
sequence generation tasks (translation and summarization), without any
additional data or annotation costs. Notably, our method significantly improves
the zero-shot performance on conditional sequence generation, e.g., up to 9.7
BLEU points on WMT zero-shot translation tasks.
Related papers
- Learning Task Representations from In-Context Learning [73.72066284711462]
Large language models (LLMs) have demonstrated remarkable proficiency in in-context learning.
We introduce an automated formulation for encoding task information in ICL prompts as a function of attention heads.
We show that our method's effectiveness stems from aligning the distribution of the last hidden state with that of an optimally performing in-context-learned model.
arXiv Detail & Related papers (2025-02-08T00:16:44Z) - Aligning Instruction Tuning with Pre-training [81.4748965653345]
We propose Aligning Instruction Tuning with Pre-training (AITP) to align instruction tuning with pre-training distributions.
We show consistent performance improvements with AITP on three fully open large language models (LLMs) across eight benchmarks.
arXiv Detail & Related papers (2025-01-16T08:27:40Z) - SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts.
We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM.
We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z) - Don't Half-listen: Capturing Key-part Information in Continual Instruction Tuning [13.535110749767451]
We propose a novel continual instruction tuning method based on Key-part Information Gain (KPIG)
Our method computes the information gain on masked parts to dynamically replay data and refine the training objective.
Experiments demonstrate our method achieves superior performance on both seen and held-out tasks.
arXiv Detail & Related papers (2024-03-15T06:54:20Z) - Fine-tuning Large Language Models with Sequential Instructions [2.546845645875049]
We find that existing instruction-tuned models struggle to respond to queries with multiple instructions.
We contend that part of the fine-tuning data mixture should be sequential--containing a chain of interrelated tasks.
We automate this process by turning instructions in existing datasets into diverse and complex sequential instructions.
Models that underwent our sequential instruction tuning show improved results in coding, maths, and open-ended generation.
arXiv Detail & Related papers (2024-03-12T16:33:30Z) - In-context Learning Generalizes, But Not Always Robustly: The Case of Syntax [36.98247762224868]
In-context learning (ICL) is now a common method for teaching large language models (LLMs) new tasks.
Do models infer the underlying structure of the task defined by the context, or do they rely on superficial generalizations that only generalize to identically distributed examples?
In experiments with models from the GPT, PaLM, and Llama 2 families, we find large variance across LMs.
The variance is explained more by the composition of the pre-training corpus and supervision methods than by model size.
arXiv Detail & Related papers (2023-11-13T23:52:43Z) - Task-guided Disentangled Tuning for Pretrained Language Models [16.429787408467703]
We propose Task-guided Disentangled Tuning (TDT) for pretrained language models (PLMs)
TDT enhances the generalization of representations by disentangling task-relevant signals from entangled representations.
Experimental results on GLUE and CLUE benchmarks show that TDT gives consistently better results than fine-tuning with different PLMs.
arXiv Detail & Related papers (2022-03-22T03:11:39Z) - Skill Induction and Planning with Latent Language [94.55783888325165]
We formulate a generative model of action sequences in which goals generate sequences of high-level subtask descriptions.
We describe how to train this model using primarily unannotated demonstrations by parsing demonstrations into sequences of named high-level subtasks.
In trained models, the space of natural language commands indexes a library of skills; agents can use these skills to plan by generating high-level instruction sequences tailored to novel goals.
arXiv Detail & Related papers (2021-10-04T15:36:32Z) - Masked Language Modeling and the Distributional Hypothesis: Order Word
Matters Pre-training for Little [74.49773960145681]
A possible explanation for the impressive performance of masked language model (MLM)-training is that such models have learned to represent the syntactic structures prevalent in NLP pipelines.
In this paper, we propose a different explanation: pre-trains succeed on downstream tasks almost entirely due to their ability to model higher-order word co-occurrence statistics.
Our results show that purely distributional information largely explains the success of pre-training, and underscore the importance of curating challenging evaluation datasets that require deeper linguistic knowledge.
arXiv Detail & Related papers (2021-04-14T06:30:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.