Related papers: BERTnesia: Investigating the capture and forgetting of knowledge in BERT

BERTnesia: Investigating the capture and forgetting of knowledge in BERT

URL: http://arxiv.org/abs/2106.02902v1
Date: Sat, 5 Jun 2021 14:23:49 GMT
Title: BERTnesia: Investigating the capture and forgetting of knowledge in BERT
Authors: Jonas Wallat, Jaspreet Singh, Avishek Anand
Abstract summary: We probe BERT specifically to understand and measure the relational knowledge it captures in its parametric memory. Our findings show that knowledge is not just contained in BERT's final layers. When BERT is fine-tuned, relational knowledge is forgotten.
Score: 7.304523502384361
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Probing complex language models has recently revealed several insights into linguistic and semantic patterns found in the learned representations. In this article, we probe BERT specifically to understand and measure the relational knowledge it captures in its parametric memory. While probing for linguistic understanding is commonly applied to all layers of BERT as well as fine-tuned models, this has not been done for factual knowledge. We utilize existing knowledge base completion tasks (LAMA) to probe every layer of pre-trained as well as fine-tuned BERT models(ranking, question answering, NER). Our findings show that knowledge is not just contained in BERT's final layers. Intermediate layers contribute a significant amount (17-60%) to the total knowledge found. Probing intermediate layers also reveals how different types of knowledge emerge at varying rates. When BERT is fine-tuned, relational knowledge is forgotten. The extent of forgetting is impacted by the fine-tuning objective and the training data. We found that ranking models forget the least and retain more knowledge in their final layer compared to masked language modeling and question-answering. However, masked language modeling performed the best at acquiring new knowledge from the training data. When it comes to learning facts, we found that capacity and fact density are key factors. We hope this initial work will spur further research into understanding the parametric memory of language models and the effect of training objectives on factual knowledge. The code to repeat the experiments is publicly available on GitHub.

Related papers

How new data permeates LLM knowledge and how to dilute it [19.96863816288517]
Large language models learn and continually learn through the accumulation of gradient-based updates. We demonstrate that when learning new information, LLMs exhibit a "priming" effect: learning a new fact can cause the model to inappropriately apply that knowledge in unrelated contexts. We show that the degree of priming after learning new information can be predicted by measuring the token probability of key words before learning.
arXiv Detail & Related papers (2025-04-13T11:25:04Z)
Does Knowledge Localization Hold True? Surprising Differences Between Entity and Relation Perspectives in Language Models [20.157061521694096]
This study investigates the differences between entity and relational knowledge through knowledge editing. To further elucidate the differences between entity and relational knowledge, we employ causal analysis to investigate how relational knowledge is stored in pre-trained models. This insight highlights the multifaceted nature of knowledge storage in language models, underscoring the complexity of manipulating specific types of knowledge within these models.
arXiv Detail & Related papers (2024-09-01T05:09:11Z)
How Large Language Models Encode Context Knowledge? A Layer-Wise Probing Study [27.23388511249688]
This paper investigates the layer-wise capability of large language models to encode knowledge. We leverage the powerful generative capability of ChatGPT to construct probing datasets. Experiments on conflicting and newly acquired knowledge show that LLMs prefer to encode more context knowledge in the upper layers.
arXiv Detail & Related papers (2024-02-25T11:15:42Z)
Decouple knowledge from parameters for plug-and-play language modeling [77.5601135412186]
We introduce PlugLM, a pre-training model with differentiable plug-in memory(DPM) The key intuition is to decouple the knowledge storage from model parameters with an editable and scalable key-value memory. PlugLM obtains 3.95 F1 improvements across four domains on average without any in-domain pre-training.
arXiv Detail & Related papers (2023-05-19T10:01:55Z)
Knowledge Rumination for Pre-trained Language Models [77.55888291165462]
We propose a new paradigm dubbed Knowledge Rumination to help the pre-trained language model utilize related latent knowledge without retrieving it from the external corpus. We apply the proposed knowledge rumination to various language models, including RoBERTa, DeBERTa, and GPT-3.
arXiv Detail & Related papers (2023-05-15T15:47:09Z)
Can BERT Refrain from Forgetting on Sequential Tasks? A Probing Study [68.75670223005716]
We find that pre-trained language models like BERT have a potential ability to learn sequentially, even without any sparse memory replay. Our experiments reveal that BERT can actually generate high quality representations for previously learned tasks in a long term, under extremely sparse replay or even no replay.
arXiv Detail & Related papers (2023-03-02T09:03:43Z)
Knowledge Graph Fusion for Language Model Fine-tuning [0.0]
We investigate the benefits of knowledge incorporation into the fine-tuning stages of BERT. An existing K-BERT model, which enriches sentences with triplets from a Knowledge Graph, is adapted for the English language. Changes made to K-BERT for accommodating the English language also extend to other word-based languages.
arXiv Detail & Related papers (2022-06-21T08:06:22Z)
Finding patterns in Knowledge Attribution for Transformers [1.52292571922932]
We use a 12-layer multi-lingual BERT model for our experiments. We observe that mostly factual knowledge can be attributed to middle and higher layers of the network. Applying the attribution scheme for grammatical knowledge, we find that grammatical knowledge is far more dispersed among the neurons than factual knowledge.
arXiv Detail & Related papers (2022-05-03T08:30:51Z)
Towards a Universal Continuous Knowledge Base [49.95342223987143]
We propose a method for building a continuous knowledge base that can store knowledge imported from multiple neural networks. Experiments on text classification show promising results. We import the knowledge from multiple models to the knowledge base, from which the fused knowledge is exported back to a single model.
arXiv Detail & Related papers (2020-12-25T12:27:44Z)
BERTnesia: Investigating the capture and forgetting of knowledge in BERT [5.849736173068868]
We probe BERT specifically to understand and measure the relational knowledge it captures. Intermediate layers contribute a significant amount (17-60%) to the total knowledge found. When BERT is fine-tuned, relational knowledge is forgotten but the extent of forgetting is impacted by the fine-tuning objective.
arXiv Detail & Related papers (2020-10-19T08:46:30Z)
CoLAKE: Contextualized Language and Knowledge Embedding [81.90416952762803]
We propose the Contextualized Language and Knowledge Embedding (CoLAKE) CoLAKE jointly learns contextualized representation for both language and knowledge with the extended objective. We conduct experiments on knowledge-driven tasks, knowledge probing tasks, and language understanding tasks.
arXiv Detail & Related papers (2020-10-01T11:39:32Z)
Common Sense or World Knowledge? Investigating Adapter-Based Knowledge Injection into Pretrained Transformers [54.417299589288184]
We investigate models for complementing the distributional knowledge of BERT with conceptual knowledge from ConceptNet and its corresponding Open Mind Common Sense (OMCS) corpus. Our adapter-based models substantially outperform BERT on inference tasks that require the type of conceptual knowledge explicitly present in ConceptNet and OMCS.
arXiv Detail & Related papers (2020-05-24T15:49:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.