Propagating Knowledge Updates to LMs Through Distillation
- URL: http://arxiv.org/abs/2306.09306v2
- Date: Tue, 31 Oct 2023 00:29:12 GMT
- Title: Propagating Knowledge Updates to LMs Through Distillation
- Authors: Shankar Padmanabhan, Yasumasa Onoe, Michael J.Q. Zhang, Greg Durrett,
Eunsol Choi
- Abstract summary: We show that a context-based approach can both impart knowledge about entities and propagate that knowledge to enable broader inferences.
Our experiments demonstrate that this approach is more effective at propagating knowledge updates than fine-tuning and other gradient-based knowledge-editing methods.
- Score: 97.3628651636153
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modern language models have the capacity to store and use immense amounts of
knowledge about real-world entities, but it remains unclear how to update such
knowledge stored in model parameters. While prior methods for updating
knowledge in LMs successfully inject atomic facts, updated LMs fail to make
inferences based on injected facts. In this work, we demonstrate that a context
distillation-based approach can both impart knowledge about entities and
propagate that knowledge to enable broader inferences. Our approach consists of
two stages: transfer set generation and distillation on the transfer set. We
first generate a transfer set by prompting a language model to generate
continuations from the entity definition. Then, we update the model parameters
so that the distribution of the LM (the student) matches the distribution of
the LM conditioned on the definition (the teacher) on the transfer set. Our
experiments demonstrate that this approach is more effective at propagating
knowledge updates than fine-tuning and other gradient-based knowledge-editing
methods. Moreover, it does not compromise performance in other contexts, even
when injecting the definitions of up to 150 entities at once.
Related papers
- LLMs as Repositories of Factual Knowledge: Limitations and Solutions [1.7764955091415962]
We study the appropriateness of Large Language Models (LLMs) as repositories of factual knowledge.
We evaluate their reliability in responding to time-sensitive factual questions.
We propose "ENtity-Aware Fine-tuning" (ENAF) to improve the model's performance.
arXiv Detail & Related papers (2025-01-22T10:16:53Z) - Multi-Stage Knowledge Integration of Vision-Language Models for Continual Learning [79.46570165281084]
We propose a Multi-Stage Knowledge Integration network (MulKI) to emulate the human learning process in distillation methods.
MulKI achieves this through four stages, including Eliciting Ideas, Adding New Ideas, Distinguishing Ideas, and Making Connections.
Our method demonstrates significant improvements in maintaining zero-shot capabilities while supporting continual learning across diverse downstream tasks.
arXiv Detail & Related papers (2024-11-11T07:36:19Z) - Gradient Localization Improves Lifelong Pretraining of Language Models [32.29298047707914]
Large Language Models (LLMs) trained on web-scale text corpora have been shown to capture world knowledge in their parameters.
In this work, we examine two types of knowledge relating to temporally sensitive entities and demonstrate that each type is localized to different sets of parameters within the LLMs.
arXiv Detail & Related papers (2024-11-07T05:43:50Z) - KIF: Knowledge Identification and Fusion for Language Model Continual Learning [41.28933724210434]
We introduce a novel framework for language models, named Knowledge Identification and Fusion (KIF)
KIF segregates the model into'skill units' based on parameter dependencies, allowing for more precise control.
It employs a novel group-wise knowledge identification technique to ascertain the importance distribution of skill units for a new task.
As a result, KIF achieves an optimal balance between retaining prior knowledge and excelling in new tasks.
arXiv Detail & Related papers (2024-08-09T17:44:45Z) - MT-PATCHER: Selective and Extendable Knowledge Distillation from Large Language Models for Machine Translation [61.65537912700187]
Large Language Models (LLM) have demonstrated their strong ability in the field of machine translation (MT)
We propose a framework called MT-Patcher, which transfers knowledge from LLMs to existing MT models in a selective, comprehensive and proactive manner.
arXiv Detail & Related papers (2024-03-14T16:07:39Z) - Can LMs Learn New Entities from Descriptions? Challenges in Propagating
Injected Knowledge [72.63368052592004]
We study LMs' abilities to make inferences based on injected facts (or propagate those facts)
We find that existing methods for updating knowledge show little propagation of injected knowledge.
Yet, prepending entity definitions in an LM's context improves performance across all settings.
arXiv Detail & Related papers (2023-05-02T17:59:46Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - Knowledge-Aware Procedural Text Understanding with Multi-Stage Training [110.93934567725826]
We focus on the task of procedural text understanding, which aims to comprehend such documents and track entities' states and locations during a process.
Two challenges, the difficulty of commonsense reasoning and data insufficiency, still remain unsolved.
We propose a novel KnOwledge-Aware proceduraL text understAnding (KOALA) model, which effectively leverages multiple forms of external knowledge.
arXiv Detail & Related papers (2020-09-28T10:28:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.