Related papers: Degradation of Feature Space in Continual Learning

Degradation of Feature Space in Continual Learning

URL: http://arxiv.org/abs/2602.06586v1
Date: Fri, 06 Feb 2026 10:26:34 GMT
Title: Degradation of Feature Space in Continual Learning
Authors: Chiara Lanza, Roberto Pereira, Marco Miozzo, Eduard Angelats, Paolo Dini,
Abstract summary: We investigate whether promoting feature-space isotropy can enhance representation quality in continual learning.<n>We find that isotropic regularization fails to improve, and can in fact degrade, model accuracy in continual settings.
Score: 2.322400467239964
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Centralized training is the standard paradigm in deep learning, enabling models to learn from a unified dataset in a single location. In such setup, isotropic feature distributions naturally arise as a mean to support well-structured and generalizable representations. In contrast, continual learning operates on streaming and non-stationary data, and trains models incrementally, inherently facing the well-known plasticity-stability dilemma. In such settings, learning dynamics tends to yield increasingly anisotropic feature space. This arises a fundamental question: should isotropy be enforced to achieve a better balance between stability and plasticity, and thereby mitigate catastrophic forgetting? In this paper, we investigate whether promoting feature-space isotropy can enhance representation quality in continual learning. Through experiments using contrastive continual learning techniques on CIFAR-10 and CIFAR-100 data, we find that isotropic regularization fails to improve, and can in fact degrade, model accuracy in continual settings. Our results highlight essential differences in feature geometry between centralized and continual learning, suggesting that isotropy, while beneficial in centralized setups, may not constitute an appropriate inductive bias for non-stationary learning scenarios.

Related papers

Stable Deep Reinforcement Learning via Isotropic Gaussian Representations [19.912439771541568]
We show that isotropic Gaussian embeddings are provably advantageous under non-stationary targets.<n>We propose the use of Sketched Isotropic Gaussian Regularization for shaping representations toward an isotropic Gaussian distribution.
arXiv Detail & Related papers (2026-02-22T22:55:27Z)
Stability as a Liability:Systematic Breakdown of Linguistic Structure in LLMs [5.96875296117642]
We show that stable parameter trajectories lead stationary solutions to minimize the forward KL divergence to the empirical distribution.<n>We empirically validate this effect using a controlled feedback-based training framework.<n>It indicates that optimization stability and generative expressivity are not inherently aligned, and that stability alone is an insufficient indicator of generative quality.
arXiv Detail & Related papers (2026-01-26T15:34:50Z)
Training instability in deep learning follows low-dimensional dynamical principles [24.97566911521709]
Training unfolds as a high-dimensional dynamical system in which small perturbations to optimization, data, parameters, or learning signals can induce abrupt and irreversible collapse.<n>We propose a unified dynamical perspective that characterizes training stability as an intrinsic property of learning systems.
arXiv Detail & Related papers (2026-01-19T15:37:45Z)
Entropy-Guided Token Dropout: Training Autoregressive Language Models with Limited Domain Data [89.96277093034547]
We introduce EntroDrop, an entropy-guided token dropout method that functions as structured data regularization.<n>We show that EntroDrop consistently outperforms standard regularization baselines and maintains robust performance throughout extended multi-epoch training.
arXiv Detail & Related papers (2025-12-29T12:35:51Z)
Activation Function Design Sustains Plasticity in Continual Learning [1.618563064839635]
In continual learning, models can progressively lose the ability to adapt.<n>We show that activation choice is a primary, architecture-agnostic lever for mitigating plasticity loss.
arXiv Detail & Related papers (2025-09-26T16:41:47Z)
SplitLoRA: Balancing Stability and Plasticity in Continual Learning Through Gradient Space Splitting [68.00007494819798]
Continual learning requires a model to learn multiple tasks in sequence while maintaining both stability:preserving knowledge from previously learned tasks, and plasticity:effectively learning new tasks.<n> Gradient projection has emerged as an effective and popular paradigm in CL, where it partitions the gradient space of previously learned tasks into two subspaces.<n>New tasks are learned effectively within the minor subspace, thereby reducing interference with previously acquired knowledge.<n>Existing Gradient Projection methods struggle to achieve an optimal balance between plasticity and stability, as it is hard to appropriately partition the gradient space.
arXiv Detail & Related papers (2025-05-28T13:57:56Z)
Large Continual Instruction Assistant [59.585544987096974]
Continual Instruction Tuning (CIT) is adopted to instruct Large Models to follow human intent data by data.<n>Existing update gradient would heavily destroy the performance on previous datasets during CIT process.<n>We propose a general continual instruction tuning framework to address the challenge.
arXiv Detail & Related papers (2024-10-08T11:24:59Z)
Branch-Tuning: Balancing Stability and Plasticity for Continual Self-Supervised Learning [33.560003528712414]
Self-supervised learning (SSL) has emerged as an effective paradigm for deriving general representations from vast amounts of unlabeled data. This poses a challenge in striking a balance between stability and plasticity when adapting to new information. We propose Branch-tuning, an efficient and straightforward method that achieves a balance between stability and plasticity in continual SSL.
arXiv Detail & Related papers (2024-03-27T05:38:48Z)
Incorporating Neuro-Inspired Adaptability for Continual Learning in Artificial Intelligence [59.11038175596807]
Continual learning aims to empower artificial intelligence with strong adaptability to the real world. Existing advances mainly focus on preserving memory stability to overcome catastrophic forgetting. We propose a generic approach that appropriately attenuates old memories in parameter distributions to improve learning plasticity.
arXiv Detail & Related papers (2023-08-29T02:43:58Z)
New Insights on Relieving Task-Recency Bias for Online Class Incremental Learning [37.888061221999294]
In all settings, the online class incremental learning (OCIL) is more challenging and can be encountered more frequently in real world. To strike a preferable trade-off between stability and plasticity, we propose an Adaptive Focus Shifting algorithm.
arXiv Detail & Related papers (2023-02-16T11:52:00Z)
Training Generative Adversarial Networks by Solving Ordinary Differential Equations [54.23691425062034]
We study the continuous-time dynamics induced by GAN training. From this perspective, we hypothesise that instabilities in training GANs arise from the integration error. We experimentally verify that well-known ODE solvers (such as Runge-Kutta) can stabilise training.
arXiv Detail & Related papers (2020-10-28T15:23:49Z)
Learning perturbation sets for robust machine learning [97.6757418136662]
We use a conditional generator that defines the perturbation set over a constrained region of the latent space. We measure the quality of our learned perturbation sets both quantitatively and qualitatively. We leverage our learned perturbation sets to train models which are empirically and certifiably robust to adversarial image corruptions and adversarial lighting variations.
arXiv Detail & Related papers (2020-07-16T16:39:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.