Related papers: Can LMs Learn New Entities from Descriptions? Challenges in Propagating Injected Knowledge

Can LMs Learn New Entities from Descriptions? Challenges in Propagating Injected Knowledge

URL: http://arxiv.org/abs/2305.01651v1
Date: Tue, 2 May 2023 17:59:46 GMT
Title: Can LMs Learn New Entities from Descriptions? Challenges in Propagating Injected Knowledge
Authors: Yasumasa Onoe, Michael J.Q. Zhang, Shankar Padmanabhan, Greg Durrett, Eunsol Choi
Abstract summary: We study LMs' abilities to make inferences based on injected facts (or propagate those facts) We find that existing methods for updating knowledge show little propagation of injected knowledge. Yet, prepending entity definitions in an LM's context improves performance across all settings.
Score: 72.63368052592004
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Pre-trained language models (LMs) are used for knowledge intensive tasks like question answering, but their knowledge gets continuously outdated as the world changes. Prior work has studied targeted updates to LMs, injecting individual facts and evaluating whether the model learns these facts while not changing predictions on other contexts. We take a step forward and study LMs' abilities to make inferences based on injected facts (or propagate those facts): for example, after learning that something is a TV show, does an LM predict that you can watch it? We study this with two cloze-style tasks: an existing dataset of real-world sentences about novel entities (ECBD) as well as a new controlled benchmark with manually designed templates requiring varying levels of inference about injected knowledge. Surprisingly, we find that existing methods for updating knowledge (gradient-based fine-tuning and modifications of this approach) show little propagation of injected knowledge. These methods improve performance on cloze instances only when there is lexical overlap between injected facts and target inferences. Yet, prepending entity definitions in an LM's context improves performance across all settings, suggesting that there is substantial headroom for parameter-updating approaches for knowledge injection.

Related papers

PropMEND: Hypernetworks for Knowledge Propagation in LLMs [82.99849359892112]
We present a hypernetwork-based approach for knowledge propagation, named PropMEND.<n>We show almost 2x accuracy on challenging multi-hop questions whose answers are not explicitly stated in the injected fact.<n>We also introduce a new dataset, Controlled RippleEdit, to evaluate the generalization of our hypernetwork.
arXiv Detail & Related papers (2025-06-10T15:44:19Z)
How new data permeates LLM knowledge and how to dilute it [19.96863816288517]
Large language models learn and continually learn through the accumulation of gradient-based updates. We demonstrate that when learning new information, LLMs exhibit a "priming" effect: learning a new fact can cause the model to inappropriately apply that knowledge in unrelated contexts. We show that the degree of priming after learning new information can be predicted by measuring the token probability of key words before learning.
arXiv Detail & Related papers (2025-04-13T11:25:04Z)
LLMs as Repositories of Factual Knowledge: Limitations and Solutions [1.7764955091415962]
We study the appropriateness of Large Language Models (LLMs) as repositories of factual knowledge. We evaluate their reliability in responding to time-sensitive factual questions. We propose "ENtity-Aware Fine-tuning" (ENAF) to improve the model's performance.
arXiv Detail & Related papers (2025-01-22T10:16:53Z)
Gradient Localization Improves Lifelong Pretraining of Language Models [32.29298047707914]
Large Language Models (LLMs) trained on web-scale text corpora have been shown to capture world knowledge in their parameters. In this work, we examine two types of knowledge relating to temporally sensitive entities and demonstrate that each type is localized to different sets of parameters within the LLMs.
arXiv Detail & Related papers (2024-11-07T05:43:50Z)
Novel-WD: Exploring acquisition of Novel World Knowledge in LLMs Using Prefix-Tuning [2.8972337324168014]
We study how PLM may learn and remember new world knowledge facts that do not occur in their pre-training corpus. We first propose Novel-WD, a new dataset consisting of sentences containing novel facts extracted from recent Wikidata updates. We make this dataset freely available to the community, and release a procedure to later build new versions of similar datasets with up-to-date information.
arXiv Detail & Related papers (2024-08-30T07:54:50Z)
Detecting Edited Knowledge in Language Models [5.260519479124422]
Knowledge editing methods (KEs) can update language models' obsolete or inaccurate knowledge learned from pre-training. Knowing whether a generated output is based on edited knowledge or first-hand knowledge from pre-training can increase users' trust in generative models. We propose a novel task: detecting edited knowledge in language models.
arXiv Detail & Related papers (2024-05-04T22:02:24Z)
Robust and Scalable Model Editing for Large Language Models [75.95623066605259]
We propose EREN (Edit models by REading Notes) to improve the scalability and robustness of LLM editing. Unlike existing techniques, it can integrate knowledge from multiple edits, and correctly respond to syntactically similar but semantically unrelated inputs.
arXiv Detail & Related papers (2024-03-26T06:57:23Z)
A Comprehensive Study of Knowledge Editing for Large Language Models [82.65729336401027]
Large Language Models (LLMs) have shown extraordinary capabilities in understanding and generating text that closely mirrors human communication. This paper defines the knowledge editing problem and provides a comprehensive review of cutting-edge approaches. We introduce a new benchmark, KnowEdit, for a comprehensive empirical evaluation of representative knowledge editing approaches.
arXiv Detail & Related papers (2024-01-02T16:54:58Z)
Propagating Knowledge Updates to LMs Through Distillation [97.3628651636153]
We show that a context-based approach can both impart knowledge about entities and propagate that knowledge to enable broader inferences. Our experiments demonstrate that this approach is more effective at propagating knowledge updates than fine-tuning and other gradient-based knowledge-editing methods.
arXiv Detail & Related papers (2023-06-15T17:39:50Z)
Decouple knowledge from parameters for plug-and-play language modeling [77.5601135412186]
We introduce PlugLM, a pre-training model with differentiable plug-in memory(DPM) The key intuition is to decouple the knowledge storage from model parameters with an editable and scalable key-value memory. PlugLM obtains 3.95 F1 improvements across four domains on average without any in-domain pre-training.
arXiv Detail & Related papers (2023-05-19T10:01:55Z)
Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP) What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining. How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z)
Enhancing Language Models with Plug-and-Play Large-Scale Commonsense [2.1248439796866228]
We study how to enhance language models (LMs) with textual commonsense knowledge. We propose a plug-and-play method for large-scale commonsense integration without pre-training.
arXiv Detail & Related papers (2021-09-06T16:16:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.