Related papers: Understanding Catastrophic Interference: On the Identifibility of Latent Representations

Understanding Catastrophic Interference: On the Identifibility of Latent Representations

URL: http://arxiv.org/abs/2509.23027v3
Date: Tue, 07 Oct 2025 12:58:54 GMT
Title: Understanding Catastrophic Interference: On the Identifibility of Latent Representations
Authors: Yuke Li, Yujia Zheng, Tianyi Xiong, Zhenyi Wang, Heng Huang,
Abstract summary: Catastrophic interference, also known as catastrophic forgetting, is a fundamental challenge in machine learning.<n>We propose a novel theoretical framework that formulates catastrophic interference as an identification problem.<n>Our approach provides both theoretical guarantees and practical performance improvements across both synthetic and benchmark datasets.
Score: 67.05452287233122
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Catastrophic interference, also known as catastrophic forgetting, is a fundamental challenge in machine learning, where a trained learning model progressively loses performance on previously learned tasks when adapting to new ones. In this paper, we aim to better understand and model the catastrophic interference problem from a latent representation learning point of view, and propose a novel theoretical framework that formulates catastrophic interference as an identification problem. Our analysis demonstrates that the forgetting phenomenon can be quantified by the distance between partial-task aware (PTA) and all-task aware (ATA) setups. Building upon recent advances in identifiability theory, we prove that this distance can be minimized through identification of shared latent variables between these setups. When learning, we propose our method \ourmeos with two-stage training strategy: First, we employ maximum likelihood estimation to learn the latent representations from both PTA and ATA configurations. Subsequently, we optimize the KL divergence to identify and learn the shared latent variables. Through theoretical guarantee and empirical validations, we establish that identifying and learning these shared representations can effectively mitigate catastrophic interference in machine learning systems. Our approach provides both theoretical guarantees and practical performance improvements across both synthetic and benchmark datasets.

Related papers

On the Paradoxical Interference between Instruction-Following and Task Solving [50.75960598434753]
Instruction following aims to align Large Language Models (LLMs) with human intent by specifying explicit constraints on how tasks should be performed.<n>We reveal a counterintuitive phenomenon: instruction following can paradoxically interfere with LLMs' task-solving capability.<n>We propose a metric, SUSTAINSCORE, to quantify the interference of instruction following with task solving.
arXiv Detail & Related papers (2026-01-29T17:48:56Z)
Measure-Theoretic Anti-Causal Representation Learning [29.12751904333385]
Causal representation learning in the anti-causal setting (labels cause features rather than the reverse) presents unique challenges.<n>We propose Anti-Causal Invariant Abstractions (ACIA), a novel measure-theoretic framework for anti-causal representation learning.<n>ACIA employs a two-level design, low-level representations capture how labels generate observations, while high-level representations learn stable causal patterns across environment-specific variations.
arXiv Detail & Related papers (2025-10-16T22:13:05Z)
Revisiting Bisimulation Metric for Robust Representations in Reinforcement Learning [11.415684244202312]
We identify two main issues with the conventional bisimulation metric.<n>We propose a revised bisimulation metric that features a more precise definition of reward gap and novel update operators with adaptive coefficient.
arXiv Detail & Related papers (2025-07-24T15:42:22Z)
Dynamic Programming Techniques for Enhancing Cognitive Representation in Knowledge Tracing [125.75923987618977]
We propose the Cognitive Representation Dynamic Programming based Knowledge Tracing (CRDP-KT) model.<n>It is a dynamic programming algorithm to optimize cognitive representations based on the difficulty of the questions and the performance intervals between them.<n>It provides more accurate and systematic input features for subsequent model training, thereby minimizing distortion in the simulation of cognitive states.
arXiv Detail & Related papers (2025-06-03T14:44:48Z)
Temporal-Difference Variational Continual Learning [89.32940051152782]
We propose new learning objectives that integrate the regularization effects of multiple previous posterior estimations.<n>Our approach effectively mitigates Catastrophic Forgetting, outperforming strong Variational CL methods.
arXiv Detail & Related papers (2024-10-10T10:58:41Z)
Independence Constrained Disentangled Representation Learning from Epistemological Perspective [13.51102815877287]
Disentangled Representation Learning aims to improve the explainability of deep learning methods by training a data encoder that identifies semantically meaningful latent variables in the data generation process. There is no consensus regarding the objective of disentangled representation learning. We propose a novel method for disentangled representation learning by employing an integration of mutual information constraint and independence constraint.
arXiv Detail & Related papers (2024-09-04T13:00:59Z)
Causality-Aware Transformer Networks for Robotic Navigation [13.719643934968367]
Current research in Visual Navigation reveals opportunities for improvement. Direct adoption of RNNs and Transformers often overlooks the specific differences between Embodied AI and traditional sequential data modelling. We propose Causality-Aware Transformer (CAT) Networks for Navigation, featuring a Causal Understanding Module.
arXiv Detail & Related papers (2024-09-04T12:53:26Z)
Self-Distilled Disentangled Learning for Counterfactual Prediction [49.84163147971955]
We propose the Self-Distilled Disentanglement framework, known as $SD2$. Grounded in information theory, it ensures theoretically sound independent disentangled representations without intricate mutual information estimator designs. Our experiments, conducted on both synthetic and real-world datasets, confirm the effectiveness of our approach.
arXiv Detail & Related papers (2024-06-09T16:58:19Z)
Contrastive-Adversarial and Diffusion: Exploring pre-training and fine-tuning strategies for sulcal identification [3.0398616939692777]
Techniques like adversarial learning, contrastive learning, diffusion denoising learning, and ordinary reconstruction learning have become standard. The study aims to elucidate the advantages of pre-training techniques and fine-tuning strategies to enhance the learning process of neural networks.
arXiv Detail & Related papers (2024-05-29T15:44:51Z)
Towards Causal Foundation Model: on Duality between Causal Inference and Attention [18.046388712804042]
We take a first step towards building causally-aware foundation models for treatment effect estimations. We propose a novel, theoretically justified method called Causal Inference with Attention (CInA)
arXiv Detail & Related papers (2023-10-01T22:28:34Z)
Which Mutual-Information Representation Learning Objectives are Sufficient for Control? [80.2534918595143]
Mutual information provides an appealing formalism for learning representations of data. This paper formalizes the sufficiency of a state representation for learning and representing the optimal policy. Surprisingly, we find that two of these objectives can yield insufficient representations given mild and common assumptions on the structure of the MDP.
arXiv Detail & Related papers (2021-06-14T10:12:34Z)
Anatomy of Catastrophic Forgetting: Hidden Representations and Task Semantics [24.57617154267565]
We investigate how forgetting affects representations in neural network models. We find that deeper layers are disproportionately the source of forgetting. We also introduce a novel CIFAR-100 based task approximating realistic input distribution shift.
arXiv Detail & Related papers (2020-07-14T23:31:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.