Continual Named Entity Recognition without Catastrophic Forgetting
- URL: http://arxiv.org/abs/2310.14541v1
- Date: Mon, 23 Oct 2023 03:45:30 GMT
- Title: Continual Named Entity Recognition without Catastrophic Forgetting
- Authors: Duzhen Zhang, Wei Cong, Jiahua Dong, Yahan Yu, Xiuyi Chen, Yonggang
Zhang, Zhen Fang
- Abstract summary: We introduce a pooled feature distillation loss that skillfully navigates the trade-off between retaining knowledge of old entity types and acquiring new ones.
We develop a confidence-based pseudo-labeling for the non-entity type.
We suggest an adaptive re-weighting type-balanced learning strategy to handle the issue of biased type distribution.
- Score: 37.316700599440935
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Continual Named Entity Recognition (CNER) is a burgeoning area, which
involves updating an existing model by incorporating new entity types
sequentially. Nevertheless, continual learning approaches are often severely
afflicted by catastrophic forgetting. This issue is intensified in CNER due to
the consolidation of old entity types from previous steps into the non-entity
type at each step, leading to what is known as the semantic shift problem of
the non-entity type. In this paper, we introduce a pooled feature distillation
loss that skillfully navigates the trade-off between retaining knowledge of old
entity types and acquiring new ones, thereby more effectively mitigating the
problem of catastrophic forgetting. Additionally, we develop a confidence-based
pseudo-labeling for the non-entity type, \emph{i.e.,} predicting entity types
using the old model to handle the semantic shift of the non-entity type.
Following the pseudo-labeling process, we suggest an adaptive re-weighting
type-balanced learning strategy to handle the issue of biased type
distribution. We carried out comprehensive experiments on ten CNER settings
using three different datasets. The results illustrate that our method
significantly outperforms prior state-of-the-art approaches, registering an
average improvement of $6.3$\% and $8.0$\% in Micro and Macro F1 scores,
respectively.
Related papers
- Activate and Reject: Towards Safe Domain Generalization under Category
Shift [71.95548187205736]
We study a practical problem of Domain Generalization under Category Shift (DGCS)
It aims to simultaneously detect unknown-class samples and classify known-class samples in the target domains.
Compared to prior DG works, we face two new challenges: 1) how to learn the concept of unknown'' during training with only source known-class samples, and 2) how to adapt the source-trained model to unseen environments.
arXiv Detail & Related papers (2023-10-07T07:53:12Z) - Multivariate Prototype Representation for Domain-Generalized Incremental
Learning [35.83706574551515]
We design a DGCIL approach that remembers old classes, adapts to new classes, and can classify reliably objects from unseen domains.
Our loss formulation maintains classification boundaries and suppresses the domain-specific information of each class.
arXiv Detail & Related papers (2023-09-24T06:42:04Z) - Task Relation Distillation and Prototypical Pseudo Label for Incremental
Named Entity Recognition [23.69922938823477]
We propose a method called task Relation Distillation and Prototypical pseudo label (RDP) for INER.
Our method achieves significant improvements over the previous state-of-the-art methods, with an average increase of 6.08% in Micro F1 score and 7.71% in Macro F1 score.
arXiv Detail & Related papers (2023-08-17T05:36:56Z) - Learning "O" Helps for Learning More: Handling the Concealed Entity
Problem for Class-incremental NER [23.625741716498037]
"Unlabeled Entity Problem" leads to severe confusion between "O" and entities.
We propose an entity-aware contrastive learning method that adaptively detects entity clusters in "O"
We introduce a more realistic and challenging benchmark for class-incremental NER.
arXiv Detail & Related papers (2022-10-10T13:26:45Z) - Distilling Causal Effect from Miscellaneous Other-Class for Continual
Named Entity Recognition [23.25929285468311]
Learning Other-Class in the same way as new entity types amplifies the catastrophic forgetting and leads to a substantial performance drop.
We propose a unified causal framework to retrieve the causality from both new entity types and Other-Class.
Experimental results on three benchmark datasets show that our method outperforms the state-of-the-art method by a large margin.
arXiv Detail & Related papers (2022-10-08T09:37:06Z) - BMD: A General Class-balanced Multicentric Dynamic Prototype Strategy
for Source-free Domain Adaptation [74.93176783541332]
Source-free Domain Adaptation (SFDA) aims to adapt a pre-trained source model to the unlabeled target domain without accessing the well-labeled source data.
To make up for the absence of source data, most existing methods introduced feature prototype based pseudo-labeling strategies.
We propose a general class-Balanced Multicentric Dynamic prototype strategy for the SFDA task.
arXiv Detail & Related papers (2022-04-06T13:23:02Z) - Adaptive Affinity Loss and Erroneous Pseudo-Label Refinement for Weakly
Supervised Semantic Segmentation [48.294903659573585]
In this paper, we propose to embed affinity learning of multi-stage approaches in a single-stage model.
A deep neural network is used to deliver comprehensive semantic information in the training phase.
Experiments are conducted on the PASCAL VOC 2012 dataset to evaluate the effectiveness of our proposed approach.
arXiv Detail & Related papers (2021-08-03T07:48:33Z) - Unsupervised and self-adaptative techniques for cross-domain person
re-identification [82.54691433502335]
Person Re-Identification (ReID) across non-overlapping cameras is a challenging task.
Unsupervised Domain Adaptation (UDA) is a promising alternative, as it performs feature-learning adaptation from a model trained on a source to a target domain without identity-label annotation.
In this paper, we propose a novel UDA-based ReID method that takes advantage of triplets of samples created by a new offline strategy.
arXiv Detail & Related papers (2021-03-21T23:58:39Z) - Exploiting Sample Uncertainty for Domain Adaptive Person
Re-Identification [137.9939571408506]
We estimate and exploit the credibility of the assigned pseudo-label of each sample to alleviate the influence of noisy labels.
Our uncertainty-guided optimization brings significant improvement and achieves the state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2020-12-16T04:09:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.