Propagating Knowledge Updates to LMs Through Distillation
- URL: http://arxiv.org/abs/2306.09306v2
- Date: Tue, 31 Oct 2023 00:29:12 GMT
- Title: Propagating Knowledge Updates to LMs Through Distillation
- Authors: Shankar Padmanabhan, Yasumasa Onoe, Michael J.Q. Zhang, Greg Durrett,
Eunsol Choi
- Abstract summary: We show that a context-based approach can both impart knowledge about entities and propagate that knowledge to enable broader inferences.
Our experiments demonstrate that this approach is more effective at propagating knowledge updates than fine-tuning and other gradient-based knowledge-editing methods.
- Score: 97.3628651636153
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modern language models have the capacity to store and use immense amounts of
knowledge about real-world entities, but it remains unclear how to update such
knowledge stored in model parameters. While prior methods for updating
knowledge in LMs successfully inject atomic facts, updated LMs fail to make
inferences based on injected facts. In this work, we demonstrate that a context
distillation-based approach can both impart knowledge about entities and
propagate that knowledge to enable broader inferences. Our approach consists of
two stages: transfer set generation and distillation on the transfer set. We
first generate a transfer set by prompting a language model to generate
continuations from the entity definition. Then, we update the model parameters
so that the distribution of the LM (the student) matches the distribution of
the LM conditioned on the definition (the teacher) on the transfer set. Our
experiments demonstrate that this approach is more effective at propagating
knowledge updates than fine-tuning and other gradient-based knowledge-editing
methods. Moreover, it does not compromise performance in other contexts, even
when injecting the definitions of up to 150 entities at once.
Related papers
- Gradient Localization Improves Lifelong Pretraining of Language Models [32.29298047707914]
Large Language Models (LLMs) trained on web-scale text corpora have been shown to capture world knowledge in their parameters.
In this work, we examine two types of knowledge relating to temporally sensitive entities and demonstrate that each type is localized to different sets of parameters within the LLMs.
arXiv Detail & Related papers (2024-11-07T05:43:50Z) - MT-PATCHER: Selective and Extendable Knowledge Distillation from Large Language Models for Machine Translation [61.65537912700187]
Large Language Models (LLM) have demonstrated their strong ability in the field of machine translation (MT)
We propose a framework called MT-Patcher, which transfers knowledge from LLMs to existing MT models in a selective, comprehensive and proactive manner.
arXiv Detail & Related papers (2024-03-14T16:07:39Z) - Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective [106.92016199403042]
We empirically investigate knowledge transfer from larger to smaller models through a parametric perspective.
We employ sensitivity-based techniques to extract and align knowledge-specific parameters between different large language models.
Our findings highlight the critical factors contributing to the process of parametric knowledge transfer.
arXiv Detail & Related papers (2023-10-17T17:58:34Z) - Can LMs Learn New Entities from Descriptions? Challenges in Propagating
Injected Knowledge [72.63368052592004]
We study LMs' abilities to make inferences based on injected facts (or propagate those facts)
We find that existing methods for updating knowledge show little propagation of injected knowledge.
Yet, prepending entity definitions in an LM's context improves performance across all settings.
arXiv Detail & Related papers (2023-05-02T17:59:46Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - LM-CORE: Language Models with Contextually Relevant External Knowledge [13.451001884972033]
We argue that storing large amounts of knowledge in the model parameters is sub-optimal given the ever-growing amounts of knowledge and resource requirements.
We present LM-CORE -- a general framework to achieve this -- that allows textitdecoupling of the language model training from the external knowledge source.
Experimental results show that LM-CORE, having access to external knowledge, achieves significant and robust outperformance over state-of-the-art knowledge-enhanced language models on knowledge probing tasks.
arXiv Detail & Related papers (2022-08-12T18:59:37Z) - Enhancing Language Models with Plug-and-Play Large-Scale Commonsense [2.1248439796866228]
We study how to enhance language models (LMs) with textual commonsense knowledge.
We propose a plug-and-play method for large-scale commonsense integration without pre-training.
arXiv Detail & Related papers (2021-09-06T16:16:10Z) - Knowledge-Aware Procedural Text Understanding with Multi-Stage Training [110.93934567725826]
We focus on the task of procedural text understanding, which aims to comprehend such documents and track entities' states and locations during a process.
Two challenges, the difficulty of commonsense reasoning and data insufficiency, still remain unsolved.
We propose a novel KnOwledge-Aware proceduraL text understAnding (KOALA) model, which effectively leverages multiple forms of external knowledge.
arXiv Detail & Related papers (2020-09-28T10:28:40Z) - Knowledge-Aware Language Model Pretraining [29.56904859722379]
We incorporate knowledge-awareness in language model pretraining without changing the transformer architecture.
We observe improved language modeling accuracy, factual correctness in LAMA knowledge probing tasks, and semantics in the hidden representations through edge probing.
Our knowledge-aware language model (KALM) can serve as a drop-in replacement for GPT-2 models.
arXiv Detail & Related papers (2020-06-29T06:09:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.