Catastrophic Forgetting in Kolmogorov-Arnold Networks
- URL: http://arxiv.org/abs/2511.12828v1
- Date: Sun, 16 Nov 2025 23:22:50 GMT
- Title: Catastrophic Forgetting in Kolmogorov-Arnold Networks
- Authors: Mohammad Marufur Rahman, Guanchu Wang, Kaixiong Zhou, Minghan Chen, Fan Yang,
- Abstract summary: Catastrophic forgetting is a longstanding challenge in continual learning.<n>Recent architectural advances like Kolmogorov-Arnold Networks (KANs) have been suggested to offer intrinsic resistance to forgetting.<n>We present a comprehensive study of catastrophic forgetting in KANs and develop a theoretical framework that links forgetting to activation support overlap and intrinsic data dimension.
- Score: 27.683054983159835
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Catastrophic forgetting is a longstanding challenge in continual learning, where models lose knowledge from earlier tasks when learning new ones. While various mitigation strategies have been proposed for Multi-Layer Perceptrons (MLPs), recent architectural advances like Kolmogorov-Arnold Networks (KANs) have been suggested to offer intrinsic resistance to forgetting by leveraging localized spline-based activations. However, the practical behavior of KANs under continual learning remains unclear, and their limitations are not well understood. To address this, we present a comprehensive study of catastrophic forgetting in KANs and develop a theoretical framework that links forgetting to activation support overlap and intrinsic data dimension. We validate these analyses through systematic experiments on synthetic and vision tasks, measuring forgetting dynamics under varying model configurations and data complexity. Further, we introduce KAN-LoRA, a novel adapter design for parameter-efficient continual fine-tuning of language models, and evaluate its effectiveness in knowledge editing tasks. Our findings reveal that while KANs exhibit promising retention in low-dimensional algorithmic settings, they remain vulnerable to forgetting in high-dimensional domains such as image classification and language modeling. These results advance the understanding of KANs' strengths and limitations, offering practical insights for continual learning system design.
Related papers
- Rethinking Hebbian Principle: Low-Dimensional Structural Projection for Unsupervised Learning [17.299267108673277]
Hebbian learning is a biological principle that intuitively describes how neurons adapt their connections through repeated stimuli.<n>We introduce the Structural Projection Hebbian Representation (SPHeRe), a novel unsupervised learning method.<n> Experimental results show that SPHeRe achieves SOTA performance among unsupervised synaptic plasticity approaches.
arXiv Detail & Related papers (2025-10-16T15:47:29Z) - Scientific Machine Learning with Kolmogorov-Arnold Networks [0.0]
The field of scientific machine learning is increasingly adopting Kolmogorov-Arnold Networks (KANs) for data encoding.<n>KANs address issues with enhanced interpretability and flexibility, enabling more efficient modeling of complex nonlinear interactions.<n>This review categorizes recent progress in KAN-based models across three perspectives: (i) data-driven learning, (ii) physics-informed modeling, and (iii) deep-operator learning.
arXiv Detail & Related papers (2025-07-30T01:26:44Z) - An Overview of Low-Rank Structures in the Training and Adaptation of Large Models [52.67110072923365]
Recent research has uncovered a widespread phenomenon in deep networks: the emergence of low-rank structures.<n>These implicit low-dimensional patterns provide valuable insights for improving the efficiency of training and fine-tuning large-scale models.<n>We present a comprehensive review of advances in exploiting low-rank structures for deep learning and shed light on their mathematical foundations.
arXiv Detail & Related papers (2025-03-25T17:26:09Z) - Conservation-informed Graph Learning for Spatiotemporal Dynamics Prediction [84.26340606752763]
In this paper, we introduce the conservation-informed GNN (CiGNN), an end-to-end explainable learning framework.<n>The network is designed to conform to the general symmetry conservation law via symmetry where conservative and non-conservative information passes over a multiscale space by a latent temporal marching strategy.<n>Results demonstrate that CiGNN exhibits remarkable baseline accuracy and generalizability, and is readily applicable to learning for prediction of varioustemporal dynamics.
arXiv Detail & Related papers (2024-12-30T13:55:59Z) - Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond [61.18736646013446]
In pursuit of a deeper understanding of its surprising behaviors, we investigate the utility of a simple yet accurate model of a trained neural network.
Across three case studies, we illustrate how it can be applied to derive new empirical insights on a diverse range of prominent phenomena.
arXiv Detail & Related papers (2024-10-31T22:54:34Z) - Low-Rank Learning by Design: the Role of Network Architecture and
Activation Linearity in Gradient Rank Collapse [14.817633094318253]
We study how architectural choices and structure of the data effect gradient rank bounds in deep neural networks (DNNs)
Our theoretical analysis provides these bounds for training fully-connected, recurrent, and convolutional neural networks.
We also demonstrate, both theoretically and empirically, how design choices like activation function linearity, bottleneck layer introduction, convolutional stride, and sequence truncation influence these bounds.
arXiv Detail & Related papers (2024-02-09T19:28:02Z) - A Detailed Study of Interpretability of Deep Neural Network based Top
Taggers [3.8541104292281805]
Recent developments in explainable AI (XAI) allow researchers to explore the inner workings of deep neural networks (DNNs)
We explore interpretability of models designed to identify jets coming from top quark decay in high energy proton-proton collisions at the Large Hadron Collider (LHC)
Our studies uncover some major pitfalls of existing XAI methods and illustrate how they can be overcome to obtain consistent and meaningful interpretation of these models.
arXiv Detail & Related papers (2022-10-09T23:02:42Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z) - Understanding the Mechanics of SPIGOT: Surrogate Gradients for Latent
Structure Learning [20.506232306308977]
Latent structure models are a powerful tool for modeling language data.
One challenge with end-to-end training of these models is the argmax operation, which has null gradient.
We explore latent structure learning through the angle of pulling back the downstream learning objective.
arXiv Detail & Related papers (2020-10-05T21:56:00Z) - An Ode to an ODE [78.97367880223254]
We present a new paradigm for Neural ODE algorithms, called ODEtoODE, where time-dependent parameters of the main flow evolve according to a matrix flow on the group O(d)
This nested system of two flows provides stability and effectiveness of training and provably solves the gradient vanishing-explosion problem.
arXiv Detail & Related papers (2020-06-19T22:05:19Z) - Towards Interpretable Deep Learning Models for Knowledge Tracing [62.75876617721375]
We propose to adopt the post-hoc method to tackle the interpretability issue for deep learning based knowledge tracing (DLKT) models.
Specifically, we focus on applying the layer-wise relevance propagation (LRP) method to interpret RNN-based DLKT model.
Experiment results show the feasibility using the LRP method for interpreting the DLKT model's predictions.
arXiv Detail & Related papers (2020-05-13T04:03:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.