Related papers: BERTnesia: Investigating the capture and forgetting of knowledge in BERT

BERTnesia: Investigating the capture and forgetting of knowledge in BERT

URL: http://arxiv.org/abs/2010.09313v2
Date: Wed, 8 Sep 2021 13:54:02 GMT
Title: BERTnesia: Investigating the capture and forgetting of knowledge in BERT
Authors: Jonas Wallat, Jaspreet Singh, Avishek Anand
Abstract summary: We probe BERT specifically to understand and measure the relational knowledge it captures. Intermediate layers contribute a significant amount (17-60%) to the total knowledge found. When BERT is fine-tuned, relational knowledge is forgotten but the extent of forgetting is impacted by the fine-tuning objective.
Score: 5.849736173068868
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Probing complex language models has recently revealed several insights into linguistic and semantic patterns found in the learned representations. In this paper, we probe BERT specifically to understand and measure the relational knowledge it captures. We utilize knowledge base completion tasks to probe every layer of pre-trained as well as fine-tuned BERT (ranking, question answering, NER). Our findings show that knowledge is not just contained in BERT's final layers. Intermediate layers contribute a significant amount (17-60%) to the total knowledge found. Probing intermediate layers also reveals how different types of knowledge emerge at varying rates. When BERT is fine-tuned, relational knowledge is forgotten but the extent of forgetting is impacted by the fine-tuning objective but not the size of the dataset. We found that ranking models forget the least and retain more knowledge in their final layer. We release our code on github to repeat the experiments.

Related papers

Can BERT Refrain from Forgetting on Sequential Tasks? A Probing Study [68.75670223005716]
We find that pre-trained language models like BERT have a potential ability to learn sequentially, even without any sparse memory replay. Our experiments reveal that BERT can actually generate high quality representations for previously learned tasks in a long term, under extremely sparse replay or even no replay.
arXiv Detail & Related papers (2023-03-02T09:03:43Z)
BERTnesia: Investigating the capture and forgetting of knowledge in BERT [7.304523502384361]
We probe BERT specifically to understand and measure the relational knowledge it captures in its parametric memory. Our findings show that knowledge is not just contained in BERT's final layers. When BERT is fine-tuned, relational knowledge is forgotten.
arXiv Detail & Related papers (2021-06-05T14:23:49Z)
Towards a Universal Continuous Knowledge Base [49.95342223987143]
We propose a method for building a continuous knowledge base that can store knowledge imported from multiple neural networks. Experiments on text classification show promising results. We import the knowledge from multiple models to the knowledge base, from which the fused knowledge is exported back to a single model.
arXiv Detail & Related papers (2020-12-25T12:27:44Z)
Towards Interpreting BERT for Reading Comprehension Based QA [19.63539594339302]
BERT and its variants have achieved state-of-the-art performance in various NLP tasks. In this work, we attempt to interpret BERT for Reading based Questioning. We observe that the initial layers focus on query-passage interaction, whereas later layers focus more on contextual understanding and enhancing the answer prediction.
arXiv Detail & Related papers (2020-10-18T13:33:49Z)
Layer-wise Guided Training for BERT: Learning Incrementally Refined Document Representations [11.46458298316499]
We propose a novel approach to fine-tune BERT in a structured manner. Specifically, we focus on Large Scale Multilabel Text Classification (LMTC) Our approach guides specific BERT layers to predict labels from specific hierarchy levels.
arXiv Detail & Related papers (2020-10-12T14:56:22Z)
CoLAKE: Contextualized Language and Knowledge Embedding [81.90416952762803]
We propose the Contextualized Language and Knowledge Embedding (CoLAKE) CoLAKE jointly learns contextualized representation for both language and knowledge with the extended objective. We conduct experiments on knowledge-driven tasks, knowledge probing tasks, and language understanding tasks.
arXiv Detail & Related papers (2020-10-01T11:39:32Z)
Common Sense or World Knowledge? Investigating Adapter-Based Knowledge Injection into Pretrained Transformers [54.417299589288184]
We investigate models for complementing the distributional knowledge of BERT with conceptual knowledge from ConceptNet and its corresponding Open Mind Common Sense (OMCS) corpus. Our adapter-based models substantially outperform BERT on inference tasks that require the type of conceptual knowledge explicitly present in ConceptNet and OMCS.
arXiv Detail & Related papers (2020-05-24T15:49:57Z)
What's so special about BERT's layers? A closer look at the NLP pipeline in monolingual and multilingual models [18.155121103400333]
We probe a Dutch BERT-based model and the multilingual BERT model for Dutch NLP tasks. Through a deeper analysis of part-of-speech tagging, we show that also within a given task, information is spread over different parts of the network.
arXiv Detail & Related papers (2020-04-14T13:41:48Z)
Incorporating BERT into Neural Machine Translation [251.54280200353674]
We propose a new algorithm named BERT-fused model, in which we first use BERT to extract representations for an input sequence. We conduct experiments on supervised (including sentence-level and document-level translations), semi-supervised and unsupervised machine translation, and achieve state-of-the-art results on seven benchmark datasets.
arXiv Detail & Related papers (2020-02-17T08:13:36Z)
BERT's output layer recognizes all hidden layers? Some Intriguing Phenomena and a simple way to boost BERT [53.63288887672302]
Bidirectional Representations from Transformers (BERT) have achieved tremendous success in many natural language processing (NLP) tasks. We find that surprisingly the output layer of BERT can reconstruct the input sentence by directly taking each layer of BERT as input. We propose a quite simple method to boost the performance of BERT.
arXiv Detail & Related papers (2020-01-25T13:35:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.