Task Relation Distillation and Prototypical Pseudo Label for Incremental
Named Entity Recognition
- URL: http://arxiv.org/abs/2308.08793v1
- Date: Thu, 17 Aug 2023 05:36:56 GMT
- Title: Task Relation Distillation and Prototypical Pseudo Label for Incremental
Named Entity Recognition
- Authors: Duzhen Zhang, Hongliu Li, Wei Cong, Rongtao Xu, Jiahua Dong, Xiuyi
Chen
- Abstract summary: We propose a method called task Relation Distillation and Prototypical pseudo label (RDP) for INER.
Our method achieves significant improvements over the previous state-of-the-art methods, with an average increase of 6.08% in Micro F1 score and 7.71% in Macro F1 score.
- Score: 23.69922938823477
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Incremental Named Entity Recognition (INER) involves the sequential learning
of new entity types without accessing the training data of previously learned
types. However, INER faces the challenge of catastrophic forgetting specific
for incremental learning, further aggravated by background shift (i.e., old and
future entity types are labeled as the non-entity type in the current task). To
address these challenges, we propose a method called task Relation Distillation
and Prototypical pseudo label (RDP) for INER. Specifically, to tackle
catastrophic forgetting, we introduce a task relation distillation scheme that
serves two purposes: 1) ensuring inter-task semantic consistency across
different incremental learning tasks by minimizing inter-task relation
distillation loss, and 2) enhancing the model's prediction confidence by
minimizing intra-task self-entropy loss. Simultaneously, to mitigate background
shift, we develop a prototypical pseudo label strategy that distinguishes old
entity types from the current non-entity type using the old model. This
strategy generates high-quality pseudo labels by measuring the distances
between token embeddings and type-wise prototypes. We conducted extensive
experiments on ten INER settings of three benchmark datasets (i.e., CoNLL2003,
I2B2, and OntoNotes5). The results demonstrate that our method achieves
significant improvements over the previous state-of-the-art methods, with an
average increase of 6.08% in Micro F1 score and 7.71% in Macro F1 score.
Related papers
- Towards Robust Incremental Learning under Ambiguous Supervision [22.9111210739047]
We propose a novel weakly-supervised learning paradigm called Incremental Partial Label Learning (IPLL)
IPLL aims to handle sequential fully-supervised learning problems where novel classes emerge from time to time.
We develop a memory replay technique that collects well-disambiguated samples while maintaining representativeness and diversity.
arXiv Detail & Related papers (2025-01-23T11:52:53Z) - Towards Modality-agnostic Label-efficient Segmentation with Entropy-Regularized Distribution Alignment [62.73503467108322]
This topic is widely studied in 3D point cloud segmentation due to the difficulty of annotating point clouds densely.
Until recently, pseudo-labels have been widely employed to facilitate training with limited ground-truth labels.
Existing pseudo-labeling approaches could suffer heavily from the noises and variations in unlabelled data.
We propose a novel learning strategy to regularize the pseudo-labels generated for training, thus effectively narrowing the gaps between pseudo-labels and model predictions.
arXiv Detail & Related papers (2024-08-29T13:31:15Z) - Low-Rank Mixture-of-Experts for Continual Medical Image Segmentation [18.984447545932706]
"catastrophic forgetting" problem occurs when model forgets previously learned features when it is extended to new categories or tasks.
We propose a network by introducing the data-specific Mixture of Experts structure to handle the new tasks or categories.
We validate our method on both class-level and task-level continual learning challenges.
arXiv Detail & Related papers (2024-06-19T14:19:50Z) - Decoupled Prototype Learning for Reliable Test-Time Adaptation [50.779896759106784]
Test-time adaptation (TTA) is a task that continually adapts a pre-trained source model to the target domain during inference.
One popular approach involves fine-tuning model with cross-entropy loss according to estimated pseudo-labels.
This study reveals that minimizing the classification error of each sample causes the cross-entropy loss's vulnerability to label noise.
We propose a novel Decoupled Prototype Learning (DPL) method that features prototype-centric loss computation.
arXiv Detail & Related papers (2024-01-15T03:33:39Z) - Continual Named Entity Recognition without Catastrophic Forgetting [37.316700599440935]
We introduce a pooled feature distillation loss that skillfully navigates the trade-off between retaining knowledge of old entity types and acquiring new ones.
We develop a confidence-based pseudo-labeling for the non-entity type.
We suggest an adaptive re-weighting type-balanced learning strategy to handle the issue of biased type distribution.
arXiv Detail & Related papers (2023-10-23T03:45:30Z) - Deep Graph Reprogramming [112.34663053130073]
"Deep graph reprogramming" is a model reusing task tailored for graph neural networks (GNNs)
We propose an innovative Data Reprogramming paradigm alongside a Model Reprogramming paradigm.
arXiv Detail & Related papers (2023-04-28T02:04:29Z) - Hierarchical Prototype Networks for Continual Graph Representation
Learning [90.78466005753505]
We present Hierarchical Prototype Networks (HPNs) which extract different levels of abstract knowledge in the form of prototypes to represent the continuously expanded graphs.
We show that HPNs not only outperform state-of-the-art baseline techniques but also consume relatively less memory.
arXiv Detail & Related papers (2021-11-30T14:15:14Z) - X-model: Improving Data Efficiency in Deep Learning with A Minimax Model [78.55482897452417]
We aim at improving data efficiency for both classification and regression setups in deep learning.
To take the power of both worlds, we propose a novel X-model.
X-model plays a minimax game between the feature extractor and task-specific heads.
arXiv Detail & Related papers (2021-10-09T13:56:48Z) - TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for
Unsupervised Sentence Embedding Learning [53.32740707197856]
We present a new state-of-the-art unsupervised method based on pre-trained Transformers and Sequential Denoising Auto-Encoder (TSDAE)
It can achieve up to 93.1% of the performance of in-domain supervised approaches.
arXiv Detail & Related papers (2021-04-14T17:02:18Z) - Lifelong Learning Without a Task Oracle [13.331659934508764]
Supervised deep neural networks are known to undergo a sharp decline in the accuracy of older tasks when new tasks are learned.
We propose and compare several candidate task-assigning mappers which require very little memory overhead.
Best-performing variants only impose an average cost of 1.7% parameter memory increase.
arXiv Detail & Related papers (2020-11-09T21:30:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.