Exploring Stability-Plasticity Trade-offs for Continual Named Entity Recognition
- URL: http://arxiv.org/abs/2508.03259v1
- Date: Tue, 05 Aug 2025 09:35:55 GMT
- Title: Exploring Stability-Plasticity Trade-offs for Continual Named Entity Recognition
- Authors: Duzhen Zhang, Chenxing Li, Jiahua Dong, Qi Liu, Dong Yu,
- Abstract summary: We propose a Stability-Plasticity Trade-off (SPT) method for Continual Named Entity Recognition (CNER)<n>From the representation perspective, we introduce a pooling operation into the original KD, permitting a level of plasticity by consolidating representation dimensions.<n>From the weight perspective, we dynamically merge the weights of old and new models, strengthening old knowledge while maintaining new knowledge.
- Score: 41.122047611943806
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Continual Named Entity Recognition (CNER) is an evolving field that focuses on sequentially updating an existing model to incorporate new entity types. Previous CNER methods primarily utilize Knowledge Distillation (KD) to preserve prior knowledge and overcome catastrophic forgetting, strictly ensuring that the representations of old and new models remain consistent. Consequently, they often impart the model with excessive stability (i.e., retention of old knowledge) but limited plasticity (i.e., acquisition of new knowledge). To address this issue, we propose a Stability-Plasticity Trade-off (SPT) method for CNER that balances these aspects from both representation and weight perspectives. From the representation perspective, we introduce a pooling operation into the original KD, permitting a level of plasticity by consolidating representation dimensions. From the weight perspective, we dynamically merge the weights of old and new models, strengthening old knowledge while maintaining new knowledge. During this fusion, we implement a weight-guided selective mechanism to prioritize significant weights. Moreover, we develop a confidence-based pseudo-labeling approach for the current non-entity type, which predicts entity types using the old model to handle the semantic shift of the non-entity type, a challenge specific to CNER that has largely been ignored by previous methods. Extensive experiments across ten CNER settings on three benchmark datasets demonstrate that our SPT method surpasses previous CNER approaches, highlighting its effectiveness in achieving a suitable stability-plasticity trade-off.
Related papers
- Orthogonal Projection Subspace to Aggregate Online Prior-knowledge for Continual Test-time Adaptation [67.80294336559574]
Continual Test Time Adaptation (CTTA) is a task that requires a source pre-trained model to continually adapt to new scenarios.<n>We propose a novel pipeline, Orthogonal Projection Subspace to aggregate online Prior-knowledge, dubbed OoPk.
arXiv Detail & Related papers (2025-06-23T18:17:39Z) - EKPC: Elastic Knowledge Preservation and Compensation for Class-Incremental Learning [53.88000987041739]
Class-Incremental Learning (CIL) aims to enable AI models to continuously learn from sequentially arriving data of different classes over time.<n>We propose the Elastic Knowledge Preservation and Compensation (EKPC) method, integrating Importance-aware importance Regularization (IPR) and Trainable Semantic Drift Compensation (TSDC) for CIL.
arXiv Detail & Related papers (2025-06-14T05:19:58Z) - Learning Along the Arrow of Time: Hyperbolic Geometry for Backward-Compatible Representation Learning [46.45124762458626]
backward compatible representation learning enables updated models to integrate seamlessly with existing ones, avoiding to reprocess stored data.<n>We propose to switch perspectives to hyperbolic geometry, where we treat time as a natural axis for capturing a model's confidence and evolution.<n> Experiments validate the superiority of the proposed method in achieving compatibility, paving the way for more resilient and adaptable machine learning systems.
arXiv Detail & Related papers (2025-06-06T07:53:40Z) - Continual Learning in Vision-Language Models via Aligned Model Merging [84.47520899851557]
We present a new perspective based on model merging to maintain stability while still retaining plasticity.<n>To maximize the effectiveness of the merging process, we propose a simple mechanism that promotes learning aligned weights with previous ones.
arXiv Detail & Related papers (2025-05-30T20:52:21Z) - Enhancing Variational Autoencoders with Smooth Robust Latent Encoding [54.74721202894622]
Variational Autoencoders (VAEs) have played a key role in scaling up diffusion-based generative models.<n>We introduce Smooth Robust Latent VAE, a novel adversarial training framework that boosts both generation quality and robustness.<n>Experiments show that SRL-VAE improves both generation quality, in image reconstruction and text-guided image editing, and robustness, against Nightshade attacks and image editing attacks.
arXiv Detail & Related papers (2025-04-24T03:17:57Z) - BECAME: BayEsian Continual Learning with Adaptive Model MErging [21.642774366793997]
We introduce a two-stage framework named BECAME, which synergizes the expertise of gradient projection and adaptive merging.<n>Our approach outperforms state-of-the-art CL methods and existing merging strategies.
arXiv Detail & Related papers (2025-04-03T15:07:28Z) - A Retention-Centric Framework for Continual Learning with Guaranteed Model Developmental Safety [75.8161094916476]
In real-world applications, learning-enabled systems often undergo iterative model development to address challenging or emerging tasks.<n>New or improving existing capabilities may inadvertently lose good capabilities of the old model, also known as catastrophic forgetting.<n>We propose a retention-centric framework with data-dependent constraints, and study how to continually develop a pretrained CLIP model for acquiring new or improving existing capabilities of image classification.
arXiv Detail & Related papers (2024-10-04T22:34:58Z) - Weighted Ensemble Models Are Strong Continual Learners [20.62749699589017]
We study the problem of continual learning (CL) where the goal is to learn a model on a sequence of tasks.<n> CL is essentially a balancing act between being able to learn on the new task and maintaining the performance on the previously learned concepts.<n>Intending to address the stability-plasticity trade-off, we propose to perform weight-ensembling of the model parameters of the previous and current tasks.
arXiv Detail & Related papers (2023-12-14T14:26:57Z) - Continual Named Entity Recognition without Catastrophic Forgetting [37.316700599440935]
We introduce a pooled feature distillation loss that skillfully navigates the trade-off between retaining knowledge of old entity types and acquiring new ones.
We develop a confidence-based pseudo-labeling for the non-entity type.
We suggest an adaptive re-weighting type-balanced learning strategy to handle the issue of biased type distribution.
arXiv Detail & Related papers (2023-10-23T03:45:30Z) - SRIL: Selective Regularization for Class-Incremental Learning [5.810252620242912]
Class-Incremental Learning aims to create an integrated model that balances plasticity and stability to overcome this challenge.
We propose a selective regularization method that accepts new knowledge while maintaining previous knowledge.
We validate the effectiveness of the proposed method through extensive experimental protocols using CIFAR-100, ImageNet-Subset, and ImageNet-Full.
arXiv Detail & Related papers (2023-05-09T05:04:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.