Investigating Forgetting in Pre-Trained Representations Through
Continual Learning
- URL: http://arxiv.org/abs/2305.05968v1
- Date: Wed, 10 May 2023 08:27:59 GMT
- Title: Investigating Forgetting in Pre-Trained Representations Through
Continual Learning
- Authors: Yun Luo, Zhen Yang, Xuefeng Bai, Fandong Meng, Jie Zhou, Yue Zhang
- Abstract summary: We study the effect of representation forgetting on the generality of pre-trained language models.
We find that the generality is destructed in various pre-trained LMs, and syntactic and semantic knowledge is forgotten through continual learning.
- Score: 51.30807066570425
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Representation forgetting refers to the drift of contextualized
representations during continual training. Intuitively, the representation
forgetting can influence the general knowledge stored in pre-trained language
models (LMs), but the concrete effect is still unclear. In this paper, we study
the effect of representation forgetting on the generality of pre-trained
language models, i.e. the potential capability for tackling future downstream
tasks. Specifically, we design three metrics, including overall generality
destruction (GD), syntactic knowledge forgetting (SynF), and semantic knowledge
forgetting (SemF), to measure the evolution of general knowledge in continual
learning. With extensive experiments, we find that the generality is destructed
in various pre-trained LMs, and syntactic and semantic knowledge is forgotten
through continual learning. Based on our experiments and analysis, we further
get two insights into alleviating general knowledge forgetting: 1) training on
general linguistic tasks at first can mitigate general knowledge forgetting; 2)
the hybrid continual learning method can mitigate the generality destruction
and maintain more general knowledge compared with those only considering
rehearsal or regularization.
Related papers
- Zero-Shot Generalization during Instruction Tuning: Insights from Similarity and Granularity [84.12126298229866]
We show that zero-shot generalization during instruction tuning happens very early.
We also show that encountering highly similar and fine-grained training data earlier during instruction tuning, without the constraints of defined "tasks", enables better generalization.
For the first time, we show that zero-shot generalization during instruction tuning is a form of similarity-based generalization between training and test data at the instance level.
arXiv Detail & Related papers (2024-06-17T16:40:21Z) - Commonsense Knowledge Transfer for Pre-trained Language Models [83.01121484432801]
We introduce commonsense knowledge transfer, a framework to transfer the commonsense knowledge stored in a neural commonsense knowledge model to a general-purpose pre-trained language model.
It first exploits general texts to form queries for extracting commonsense knowledge from the neural commonsense knowledge model.
It then refines the language model with two self-supervised objectives: commonsense mask infilling and commonsense relation prediction.
arXiv Detail & Related papers (2023-06-04T15:44:51Z) - Unveiling the Tapestry: the Interplay of Generalization and Forgetting in Continual Learning [18.61040106667249]
In AI, generalization refers to a model's ability to perform well on out-of-distribution data related to a given task, beyond the data it was trained on.
Continual learning methods often include mechanisms to mitigate catastrophic forgetting, ensuring that knowledge from earlier tasks is retained.
We introduce a simple and effective technique known as Shape-Texture Consistency Regularization (STCR), which caters to continual learning.
arXiv Detail & Related papers (2022-11-21T04:36:24Z) - Contextualization and Generalization in Entity and Relation Extraction [0.0]
We study the behaviour of state-of-the-art models regarding generalization to facts unseen during training.
Traditional benchmarks present important lexical overlap between mentions and relations used for training and evaluating models.
We propose empirical studies to separate performance based on mention and relation overlap with the training set.
arXiv Detail & Related papers (2022-06-15T14:16:42Z) - Does Pre-training Induce Systematic Inference? How Masked Language
Models Acquire Commonsense Knowledge [91.15301779076187]
We introduce verbalized knowledge into the minibatches of a BERT model during pre-training and evaluate how well the model generalizes to supported inferences.
We find generalization does not improve over the course of pre-training, suggesting that commonsense knowledge is acquired from surface-level, co-occurrence patterns rather than induced, systematic reasoning.
arXiv Detail & Related papers (2021-12-16T03:13:04Z) - Generated Knowledge Prompting for Commonsense Reasoning [53.88983683513114]
We propose generating knowledge statements directly from a language model with a generic prompt format.
This approach improves performance of both off-the-shelf and finetuned language models on four commonsense reasoning tasks.
Notably, we find that a model's predictions can improve when using its own generated knowledge.
arXiv Detail & Related papers (2021-10-15T21:58:03Z) - Continual Learning for Text Classification with Information
Disentanglement Based Regularization [18.258948837964724]
We propose an information disentanglement based regularization method for continual learning on text classification.
Experiments conducted on large-scale benchmarks demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2021-04-12T14:17:43Z) - K-XLNet: A General Method for Combining Explicit Knowledge with Language
Model Pretraining [5.178964604577459]
We focus on improving model pretraining by leveraging explicit knowledge.
To be specific, we first match knowledge facts from knowledge graph (KG) and then add a knowledge injunction layer to transformer directly.
The experimental results show that solely by adding external knowledge to transformer can improve the learning performance on many NLP tasks.
arXiv Detail & Related papers (2021-03-25T06:14:18Z) - Common Sense or World Knowledge? Investigating Adapter-Based Knowledge
Injection into Pretrained Transformers [54.417299589288184]
We investigate models for complementing the distributional knowledge of BERT with conceptual knowledge from ConceptNet and its corresponding Open Mind Common Sense (OMCS) corpus.
Our adapter-based models substantially outperform BERT on inference tasks that require the type of conceptual knowledge explicitly present in ConceptNet and OMCS.
arXiv Detail & Related papers (2020-05-24T15:49:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.