Instruction Position Matters in Sequence Generation with Large Language
Models
- URL: http://arxiv.org/abs/2308.12097v1
- Date: Wed, 23 Aug 2023 12:36:57 GMT
- Title: Instruction Position Matters in Sequence Generation with Large Language
Models
- Authors: Yijin Liu, Xianfeng Zeng, Fandong Meng, Jie Zhou
- Abstract summary: Large language models (LLMs) are capable of performing conditional sequence generation tasks, such as translation or summarization.
We propose enhancing the instruction-following capability of LLMs by shifting the position of task instructions after the input sentences.
- Score: 67.87516654892343
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) are capable of performing conditional sequence
generation tasks, such as translation or summarization, through instruction
fine-tuning. The fine-tuning data is generally sequentially concatenated from a
specific task instruction, an input sentence, and the corresponding response.
Considering the locality modeled by the self-attention mechanism of LLMs, these
models face the risk of instruction forgetting when generating responses for
long input sentences. To mitigate this issue, we propose enhancing the
instruction-following capability of LLMs by shifting the position of task
instructions after the input sentences. Theoretical analysis suggests that our
straightforward method can alter the model's learning focus, thereby
emphasizing the training of instruction-following capabilities. Concurrently,
experimental results demonstrate that our approach consistently outperforms
traditional settings across various model scales (1B / 7B / 13B) and different
sequence generation tasks (translation and summarization), without any
additional data or annotation costs. Notably, our method significantly improves
the zero-shot performance on conditional sequence generation, e.g., up to 9.7
BLEU points on WMT zero-shot translation tasks.
Related papers
- SwitchCIT: Switching for Continual Instruction Tuning of Large Language Models [14.085371250265224]
Large language models (LLMs) have exhibited impressive capabilities in various domains, particularly in general language understanding.
However these models, trained on massive text data, may not be finely optimized for specific tasks triggered by instructions.
This work addresses the catastrophic forgetting in continual instruction learning for LLMs through a switching mechanism for routing computations to parameter-efficient tuned models.
arXiv Detail & Related papers (2024-07-16T14:37:33Z) - SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts.
We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM.
We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z) - Don't Half-listen: Capturing Key-part Information in Continual Instruction Tuning [13.535110749767451]
We propose a novel continual instruction tuning method based on Key-part Information Gain (KPIG)
Our method computes the information gain on masked parts to dynamically replay data and refine the training objective.
Experiments demonstrate our method achieves superior performance on both seen and held-out tasks.
arXiv Detail & Related papers (2024-03-15T06:54:20Z) - Fine-tuning Large Language Models with Sequential Instructions [2.546845645875049]
We find that existing instruction-tuned models struggle to respond to queries with multiple instructions.
We contend that part of the fine-tuning data mixture should be sequential--containing a chain of interrelated tasks.
We automate this process by turning instructions in existing datasets into diverse and complex sequential instructions.
Models that underwent our sequential instruction tuning show improved results in coding, maths, and open-ended generation.
arXiv Detail & Related papers (2024-03-12T16:33:30Z) - In-context Learning Generalizes, But Not Always Robustly: The Case of Syntax [36.98247762224868]
In-context learning (ICL) is now a common method for teaching large language models (LLMs) new tasks.
Do models infer the underlying structure of the task defined by the context, or do they rely on superficial generalizations that only generalize to identically distributed examples?
In experiments with models from the GPT, PaLM, and Llama 2 families, we find large variance across LMs.
The variance is explained more by the composition of the pre-training corpus and supervision methods than by model size.
arXiv Detail & Related papers (2023-11-13T23:52:43Z) - BLISS: Robust Sequence-to-Sequence Learning via Self-Supervised Input
Representation [92.75908003533736]
We propose a framework-level robust sequence-to-sequence learning approach, named BLISS, via self-supervised input representation.
We conduct comprehensive experiments to validate the effectiveness of BLISS on various tasks, including machine translation, grammatical error correction, and text summarization.
arXiv Detail & Related papers (2022-04-16T16:19:47Z) - Skill Induction and Planning with Latent Language [94.55783888325165]
We formulate a generative model of action sequences in which goals generate sequences of high-level subtask descriptions.
We describe how to train this model using primarily unannotated demonstrations by parsing demonstrations into sequences of named high-level subtasks.
In trained models, the space of natural language commands indexes a library of skills; agents can use these skills to plan by generating high-level instruction sequences tailored to novel goals.
arXiv Detail & Related papers (2021-10-04T15:36:32Z) - Masked Language Modeling and the Distributional Hypothesis: Order Word
Matters Pre-training for Little [74.49773960145681]
A possible explanation for the impressive performance of masked language model (MLM)-training is that such models have learned to represent the syntactic structures prevalent in NLP pipelines.
In this paper, we propose a different explanation: pre-trains succeed on downstream tasks almost entirely due to their ability to model higher-order word co-occurrence statistics.
Our results show that purely distributional information largely explains the success of pre-training, and underscore the importance of curating challenging evaluation datasets that require deeper linguistic knowledge.
arXiv Detail & Related papers (2021-04-14T06:30:36Z) - COCO-LM: Correcting and Contrasting Text Sequences for Language Model
Pretraining [59.169836983883656]
COCO-LM is a new self-supervised learning framework that pretrains Language Models by COrrecting challenging errors and COntrasting text sequences.
COCO-LM employs an auxiliary language model to mask-and-predict tokens in original text sequences.
Our analyses reveal that COCO-LM's advantages come from its challenging training signals, more contextualized token representations, and regularized sequence representations.
arXiv Detail & Related papers (2021-02-16T22:24:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.