On the Loss of Context-awareness in General Instruction Fine-tuning
- URL: http://arxiv.org/abs/2411.02688v3
- Date: Sun, 02 Feb 2025 19:28:39 GMT
- Title: On the Loss of Context-awareness in General Instruction Fine-tuning
- Authors: Yihan Wang, Andrew Bai, Nanyun Peng, Cho-Jui Hsieh,
- Abstract summary: We investigate the loss of context awareness after supervised fine-tuning.
We find that the performance decline is associated with a bias toward different roles learned during conversational instruction fine-tuning.
We propose a metric to identify context-dependent examples from general instruction fine-tuning datasets.
- Score: 101.03941308894191
- License:
- Abstract: Pre-trained Large Language Models (LLMs) require post-training methods such as supervised fine-tuning (SFT) on instruction-response pairs to enable instruction following. However, this process can potentially harm existing capabilities learned during pre-training. In this paper, we investigate the loss of context awareness after SFT, where context awareness is defined as the ability to extract and understand information from user-provided context and respond accordingly. We identify and demonstrate that the loss of context awareness, particularly in open-source models, occurs in instruction fine-tuned LLMs when the chat template is applied to input prompts. We identify that the performance decline is associated with a bias toward different roles learned during conversational instruction fine-tuning. We demonstrate this correlation by visualizing changes in attention allocation after the chat template is applied and manually steering the attention heads. The bias can be learned from training examples that align with the model's internal knowledge and rely less on the user-provided context to generate correct responses. Based on these observations, we propose a metric to identify context-dependent examples from general instruction fine-tuning datasets. We then apply conditional instruction fine-tuning with a context-dependency indicator, enabling the model to preserve context awareness after SFT. Empirical experiments on four context-dependent downstream tasks and three pre-trained LLMs of different sizes show that our method effectively mitigates the loss of context awareness without compromising general instruction-following capabilities.
Related papers
- Eliciting Causal Abilities in Large Language Models for Reasoning Tasks [14.512834333917414]
We introduce the Self-Causal Instruction Enhancement (SCIE) method, which enables LLMs to generate high-quality, low-quantity observational data.
In SCIE, the instructions are treated as the treatment, and textual features are used to process natural language.
Our method effectively generates instructions that enhance reasoning performance with reduced training cost of prompts.
arXiv Detail & Related papers (2024-12-19T17:03:02Z) - Context-Parametric Inversion: Why Instruction Finetuning May Not Actually Improve Context Reliance [68.56701216210617]
In-principle, one would expect models to adapt to the user context better after instruction finetuning.
We observe a surprising failure mode: during instruction tuning, the context reliance under knowledge conflicts initially increases as expected, but then gradually decreases.
arXiv Detail & Related papers (2024-10-14T17:57:09Z) - C-ICL: Contrastive In-context Learning for Information Extraction [54.39470114243744]
c-ICL is a novel few-shot technique that leverages both correct and incorrect sample constructions to create in-context learning demonstrations.
Our experiments on various datasets indicate that c-ICL outperforms previous few-shot in-context learning methods.
arXiv Detail & Related papers (2024-02-17T11:28:08Z) - The mechanistic basis of data dependence and abrupt learning in an
in-context classification task [0.3626013617212666]
We show that specific distributional properties inherent in language control the trade-off or simultaneous appearance of two forms of learning.
In-context learning is driven by the abrupt emergence of an induction head, which subsequently competes with in-weights learning.
We propose that the sharp transitions in attention-based networks arise due to a specific chain of multi-layer operations necessary to achieve ICL.
arXiv Detail & Related papers (2023-12-03T20:53:41Z) - Understanding Catastrophic Forgetting in Language Models via Implicit Inference [12.09165658395643]
We demonstrate that improving performance on tasks within the fine-tuning data distribution comes at the expense of capabilities on other tasks.
We propose Conjugate Prompting, which artificially makes the task look farther from the fine-tuning distribution while requiring the same capability.
arXiv Detail & Related papers (2023-09-18T19:28:48Z) - In-Context Probing: Toward Building Robust Classifiers via Probing Large
Language Models [5.5089506884366735]
In this paper, we propose an alternative approach, which we term In-Context Probing (ICP)
Similar to in-context learning, we contextualize the representation of the input with an instruction, but instead of decoding the output prediction, we probe the contextualized representation to predict the label.
We show that ICP performs competitive or superior to finetuning and can be particularly helpful to build classifiers on top of smaller models.
arXiv Detail & Related papers (2023-05-23T15:43:04Z) - Context-faithful Prompting for Large Language Models [51.194410884263135]
Large language models (LLMs) encode parametric knowledge about world facts.
Their reliance on parametric knowledge may cause them to overlook contextual cues, leading to incorrect predictions in context-sensitive NLP tasks.
We assess and enhance LLMs' contextual faithfulness in two aspects: knowledge conflict and prediction with abstention.
arXiv Detail & Related papers (2023-03-20T17:54:58Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - An Explanation of In-context Learning as Implicit Bayesian Inference [117.19809377740188]
We study the role of the pretraining distribution on the emergence of in-context learning.
We prove that in-context learning occurs implicitly via Bayesian inference of the latent concept.
We empirically find that scaling model size improves in-context accuracy even when the pretraining loss is the same.
arXiv Detail & Related papers (2021-11-03T09:12:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.