The Effect of Masking Strategies on Knowledge Retention by Language
Models
- URL: http://arxiv.org/abs/2306.07185v1
- Date: Mon, 12 Jun 2023 15:35:23 GMT
- Title: The Effect of Masking Strategies on Knowledge Retention by Language
Models
- Authors: Jonas Wallat, Tianyi Zhang, Avishek Anand
- Abstract summary: This paper aims to understand the effect of pre-training tasks on the amount of knowledge captured and forgotten by language models.
We test the model's knowledge retention by measuring its ability to answer factual questions.
Our findings demonstrate that, like the ability to perform a task, the knowledge acquired from being trained on that task is forgotten when a model is trained to perform another task.
- Score: 9.130890741447422
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Language models retain a significant amount of world knowledge from their
pre-training stage. This allows knowledgeable models to be applied to
knowledge-intensive tasks prevalent in information retrieval, such as ranking
or question answering. Understanding how and which factual information is
acquired by our models is necessary to build responsible models. However,
limited work has been done to understand the effect of pre-training tasks on
the amount of knowledge captured and forgotten by language models during
pre-training. Building a better understanding of knowledge acquisition is the
goal of this paper. Therefore, we utilize a selection of pre-training tasks to
infuse knowledge into our model. In the following steps, we test the model's
knowledge retention by measuring its ability to answer factual questions. Our
experiments show that masking entities and principled masking of correlated
spans based on pointwise mutual information lead to more factual knowledge
being retained than masking random tokens. Our findings demonstrate that, like
the ability to perform a task, the (factual) knowledge acquired from being
trained on that task is forgotten when a model is trained to perform another
task (catastrophic forgetting) and how to prevent this phenomenon. To foster
reproducibility, the code, as well as the data used in this paper, are openly
available.
Related papers
- Gradual Learning: Optimizing Fine-Tuning with Partially Mastered Knowledge in Large Language Models [51.20499954955646]
Large language models (LLMs) acquire vast amounts of knowledge from extensive text corpora during the pretraining phase.
In later stages such as fine-tuning and inference, the model may encounter knowledge not covered in the initial training.
We propose a two-stage fine-tuning strategy to improve the model's overall test accuracy and knowledge retention.
arXiv Detail & Related papers (2024-10-08T08:35:16Z) - Large Language Models are Limited in Out-of-Context Knowledge Reasoning [65.72847298578071]
Large Language Models (LLMs) possess extensive knowledge and strong capabilities in performing in-context reasoning.
This paper focuses on a significant aspect of out-of-context reasoning: Out-of-Context Knowledge Reasoning (OCKR), which is to combine multiple knowledge to infer new knowledge.
arXiv Detail & Related papers (2024-06-11T15:58:59Z) - Large Scale Knowledge Washing [24.533316191149677]
Large language models show impressive abilities in memorizing world knowledge.
We introduce the problem of Large Scale Knowledge Washing, focusing on unlearning an extensive amount of factual knowledge.
arXiv Detail & Related papers (2024-05-26T23:29:49Z) - Anti-Retroactive Interference for Lifelong Learning [65.50683752919089]
We design a paradigm for lifelong learning based on meta-learning and associative mechanism of the brain.
It tackles the problem from two aspects: extracting knowledge and memorizing knowledge.
It is theoretically analyzed that the proposed learning paradigm can make the models of different tasks converge to the same optimum.
arXiv Detail & Related papers (2022-08-27T09:27:36Z) - Learning with Recoverable Forgetting [77.56338597012927]
Learning wIth Recoverable Forgetting explicitly handles the task- or sample-specific knowledge removal and recovery.
Specifically, LIRF brings in two innovative schemes, namely knowledge deposit and withdrawal.
We conduct experiments on several datasets, and demonstrate that the proposed LIRF strategy yields encouraging results with gratifying generalization capability.
arXiv Detail & Related papers (2022-07-17T16:42:31Z) - DKPLM: Decomposable Knowledge-enhanced Pre-trained Language Model for
Natural Language Understanding [19.478288026844893]
Knowledge-Enhanced Pre-trained Language Models (KEPLMs) are pre-trained models with relation triples injecting from knowledge graphs to improve language understanding abilities.
Previous studies integrate models with knowledge encoders for representing knowledge retrieved from knowledge graphs.
We propose a novel KEPLM named DKPLM that Decomposes Knowledge injection process of the Pre-trained Language Models in pre-training, fine-tuning and inference stages.
arXiv Detail & Related papers (2021-12-02T08:19:42Z) - Towards a Universal Continuous Knowledge Base [49.95342223987143]
We propose a method for building a continuous knowledge base that can store knowledge imported from multiple neural networks.
Experiments on text classification show promising results.
We import the knowledge from multiple models to the knowledge base, from which the fused knowledge is exported back to a single model.
arXiv Detail & Related papers (2020-12-25T12:27:44Z) - Knowledge-driven Data Construction for Zero-shot Evaluation in
Commonsense Question Answering [80.60605604261416]
We propose a novel neuro-symbolic framework for zero-shot question answering across commonsense tasks.
We vary the set of language models, training regimes, knowledge sources, and data generation strategies, and measure their impact across tasks.
We show that, while an individual knowledge graph is better suited for specific tasks, a global knowledge graph brings consistent gains across different tasks.
arXiv Detail & Related papers (2020-11-07T22:52:21Z) - REALM: Retrieval-Augmented Language Model Pre-Training [37.3178586179607]
We augment language model pre-training with a latent knowledge retriever, which allows the model to retrieve and attend over documents from a large corpus such as Wikipedia.
For the first time, we show how to pre-train such a knowledge retriever in an unsupervised manner.
We demonstrate the effectiveness of Retrieval-Augmented Language Model pre-training (REALM) by fine-tuning on the challenging task of Open-domain Question Answering (Open-QA)
arXiv Detail & Related papers (2020-02-10T18:40:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.