Gradient Localization Improves Lifelong Pretraining of Language Models
- URL: http://arxiv.org/abs/2411.04448v1
- Date: Thu, 07 Nov 2024 05:43:50 GMT
- Title: Gradient Localization Improves Lifelong Pretraining of Language Models
- Authors: Jared Fernandez, Yonatan Bisk, Emma Strubell,
- Abstract summary: Large Language Models (LLMs) trained on web-scale text corpora have been shown to capture world knowledge in their parameters.
In this work, we examine two types of knowledge relating to temporally sensitive entities and demonstrate that each type is localized to different sets of parameters within the LLMs.
- Score: 32.29298047707914
- License:
- Abstract: Large Language Models (LLMs) trained on web-scale text corpora have been shown to capture world knowledge in their parameters. However, the mechanism by which language models store different types of knowledge is poorly understood. In this work, we examine two types of knowledge relating to temporally sensitive entities and demonstrate that each type is localized to different sets of parameters within the LLMs. We hypothesize that the lack of consideration of the locality of knowledge in existing continual learning methods contributes to both: the failed uptake of new information, and catastrophic forgetting of previously learned information. We observe that sequences containing references to updated and newly mentioned entities exhibit larger gradient norms in a subset of layers. We demonstrate that targeting parameter updates to these relevant layers can improve the performance of continually pretraining on language containing temporal drift.
Related papers
- When Context Leads but Parametric Memory Follows in Large Language Models [4.567122178196834]
Large language models (LLMs) have demonstrated remarkable progress in leveraging diverse knowledge sources.
This study investigates how nine widely used LLMs allocate knowledge between local context and global parameters when answering open-ended questions.
arXiv Detail & Related papers (2024-09-13T00:03:19Z) - A Comprehensive Study of Knowledge Editing for Large Language Models [82.65729336401027]
Large Language Models (LLMs) have shown extraordinary capabilities in understanding and generating text that closely mirrors human communication.
This paper defines the knowledge editing problem and provides a comprehensive review of cutting-edge approaches.
We introduce a new benchmark, KnowEdit, for a comprehensive empirical evaluation of representative knowledge editing approaches.
arXiv Detail & Related papers (2024-01-02T16:54:58Z) - Carpe Diem: On the Evaluation of World Knowledge in Lifelong Language Models [74.81091933317882]
We introduce EvolvingQA, a temporally evolving question-answering benchmark designed for training and evaluating LMs on an evolving Wikipedia database.
We uncover that existing continual learning baselines suffer from updating and removing outdated knowledge.
Our work aims to model the dynamic nature of real-world information, suggesting faithful evaluations of the evolution-adaptability of language models.
arXiv Detail & Related papers (2023-11-14T12:12:02Z) - Forgetting before Learning: Utilizing Parametric Arithmetic for
Knowledge Updating in Large Language Models [53.52344131257681]
We propose a new paradigm for fine-tuning called F-Learning, which employs parametric arithmetic to facilitate the forgetting of old knowledge and learning of new knowledge.
Experimental results on two publicly available datasets demonstrate that our proposed F-Learning can obviously improve the knowledge updating performance of both full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2023-11-14T09:12:40Z) - Propagating Knowledge Updates to LMs Through Distillation [97.3628651636153]
We show that a context-based approach can both impart knowledge about entities and propagate that knowledge to enable broader inferences.
Our experiments demonstrate that this approach is more effective at propagating knowledge updates than fine-tuning and other gradient-based knowledge-editing methods.
arXiv Detail & Related papers (2023-06-15T17:39:50Z) - Can LMs Learn New Entities from Descriptions? Challenges in Propagating
Injected Knowledge [72.63368052592004]
We study LMs' abilities to make inferences based on injected facts (or propagate those facts)
We find that existing methods for updating knowledge show little propagation of injected knowledge.
Yet, prepending entity definitions in an LM's context improves performance across all settings.
arXiv Detail & Related papers (2023-05-02T17:59:46Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - Improving Temporal Generalization of Pre-trained Language Models with
Lexical Semantic Change [28.106524698188675]
Recent research has revealed that neural language models at scale suffer from poor temporal generalization capability.
We propose a simple yet effective lexical-level masking strategy to post-train a converged language model.
arXiv Detail & Related papers (2022-10-31T08:12:41Z) - Lifelong Learning Natural Language Processing Approach for Multilingual
Data Classification [1.3999481573773074]
We propose a lifelong learning-inspired approach, which allows for fake news detection in multiple languages.
The ability of models to generalize the knowledge acquired between the analyzed languages was also observed.
arXiv Detail & Related papers (2022-05-25T10:34:04Z) - How transfer learning impacts linguistic knowledge in deep NLP models? [22.035813865470956]
Deep NLP models learn non-trivial amount of linguistic knowledge, captured at different layers of the model.
We investigate how fine-tuning towards downstream NLP tasks impacts the learned linguistic knowledge.
arXiv Detail & Related papers (2021-05-31T17:43:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.