Related papers: A preliminary study on continual learning in computer vision using Kolmogorov-Arnold Networks

A preliminary study on continual learning in computer vision using Kolmogorov-Arnold Networks

URL: http://arxiv.org/abs/2409.13550v2
Date: Fri, 27 Sep 2024 15:41:33 GMT
Title: A preliminary study on continual learning in computer vision using Kolmogorov-Arnold Networks
Authors: Alessandro Cacciatore, Valerio Morelli, Federica Paganica, Emanuele Frontoni, Lucia Migliorelli, Daniele Berardini,
Abstract summary: Kolmogorov- Networks (KAN) are based on a fundamentally different mathematical framework. KANs address several major issues insio, such as forgetting in continual learning scenarios. We extend the investigation by evaluating the performance of KANs in continual learning tasks within computer vision.
Score: 43.70716358136333
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep learning has long been dominated by multi-layer perceptrons (MLPs), which have demonstrated superiority over other optimizable models in various domains. Recently, a new alternative to MLPs has emerged - Kolmogorov-Arnold Networks (KAN)- which are based on a fundamentally different mathematical framework. According to their authors, KANs address several major issues in MLPs, such as catastrophic forgetting in continual learning scenarios. However, this claim has only been supported by results from a regression task on a toy 1D dataset. In this paper, we extend the investigation by evaluating the performance of KANs in continual learning tasks within computer vision, specifically using the MNIST datasets. To this end, we conduct a structured analysis of the behavior of MLPs and two KAN-based models in a class-incremental learning scenario, ensuring that the architectures involved have the same number of trainable parameters. Our results demonstrate that an efficient version of KAN outperforms both traditional MLPs and the original KAN implementation. We further analyze the influence of hyperparameters in MLPs and KANs, as well as the impact of certain trainable parameters in KANs, such as bias and scale weights. Additionally, we provide a preliminary investigation of recent KAN-based convolutional networks and compare their performance with that of traditional convolutional neural networks. Our codes can be found at https://github.com/MrPio/KAN-Continual_Learning_tests.

Related papers

Improving Memory Efficiency for Training KANs via Meta Learning [55.24089119864207]
We propose to generate weights for KANs via a smaller meta-learner, called MetaKANs.<n>By training KANs and MetaKANs in an end-to-end differentiable manner, MetaKANs achieve comparable or even superior performance.
arXiv Detail & Related papers (2025-06-09T08:38:26Z)
MLPs and KANs for data-driven learning in physical problems: A performance comparison [4.252092276491948]
Kolmogorov-Layer Networks (KANs) are an alternative to traditional neural networks represented by Multi-Arnold Perceptrons (MLPs) While showing promise, their performance advantages in physics-based problems remain largely unexplored. This suggests that KANs are a promising choice, offering a balance of efficiency and accuracy in applications involving physical systems.
arXiv Detail & Related papers (2025-04-15T17:13:42Z)
PRKAN: Parameter-Reduced Kolmogorov-Arnold Networks [47.947045173329315]
Kolmogorov-Arnold Networks (KANs) represent an innovation in neural network architectures. KANs offer a compelling alternative to Multi-Layer Perceptrons (MLPs) in models such as CNNs, RecurrentReduced Networks (RNNs) and Transformers. This paper introduces PRKANs, which employ several methods to reduce the parameter count in layers, making them comparable to Neural M layers.
arXiv Detail & Related papers (2025-01-13T03:07:39Z)
Kolmogorov-Arnold Network Autoencoders [0.0]
Kolmogorov-Arnold Networks (KANs) are promising alternatives to Multi-Layer Perceptrons (MLPs) KANs align closely with the Kolmogorov-Arnold representation theorem, potentially enhancing both model accuracy and interpretability. Our results demonstrate that KAN-based autoencoders achieve competitive performance in terms of reconstruction accuracy.
arXiv Detail & Related papers (2024-10-02T22:56:00Z)
Kolmogorov-Arnold Networks in Low-Data Regimes: A Comparative Study with Multilayer Perceptrons [2.77390041716769]
Kolmogorov-Arnold Networks (KANs) use highly flexible learnable activation functions directly on network edges. KANs significantly increase the number of learnable parameters, raising concerns about their effectiveness in data-scarce environments. We show that individualized activation functions achieve significantly higher predictive accuracy with only a modest increase in parameters.
arXiv Detail & Related papers (2024-09-16T16:56:08Z)
KAN v.s. MLP for Offline Reinforcement Learning [4.3621896506713185]
Kolmogorov-Arnold Networks (KAN) is an emerging neural network architecture in machine learning. In this paper, we explore the incorporation of KAN into the actor and critic networks for offline reinforcement learning.
arXiv Detail & Related papers (2024-09-15T07:52:44Z)
Kolmogorov-Arnold Network for Online Reinforcement Learning [0.22615818641180724]
Kolmogorov-Arnold Networks (KANs) have shown potential as an alternative to Multi-Layer Perceptrons (MLPs) in neural networks. KANs provide universal function approximation with fewer parameters and reduced memory usage.
arXiv Detail & Related papers (2024-08-09T03:32:37Z)
A comprehensive and FAIR comparison between MLP and KAN representations for differential equations and operator networks [8.573300153709358]
Kolmogorov-Arnold Networks (KANs) were recently introduced as an alternative representation model to the standard representation model. Herein, we employ KANs to construct machine learning models (PIKANs) and deep operator models (DeepOKANs) for solving differential equations for forward and inverse problems.
arXiv Detail & Related papers (2024-06-05T04:10:36Z)
Robust Learning with Progressive Data Expansion Against Spurious Correlation [65.83104529677234]
We study the learning process of a two-layer nonlinear convolutional neural network in the presence of spurious features. Our analysis suggests that imbalanced data groups and easily learnable spurious features can lead to the dominance of spurious features during the learning process. We propose a new training algorithm called PDE that efficiently enhances the model's robustness for a better worst-group performance.
arXiv Detail & Related papers (2023-06-08T05:44:06Z)
Dynamically-Scaled Deep Canonical Correlation Analysis [77.34726150561087]
Canonical Correlation Analysis (CCA) is a method for feature extraction of two views by finding maximally correlated linear projections of them. We introduce a novel dynamic scaling method for training an input-dependent canonical correlation model.
arXiv Detail & Related papers (2022-03-23T12:52:49Z)
Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks. This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z)
Provably Efficient Neural Estimation of Structural Equation Model: An Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs) We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent. For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z)
Recent Developments Combining Ensemble Smoother and Deep Generative Networks for Facies History Matching [58.720142291102135]
This research project focuses on the use of autoencoders networks to construct a continuous parameterization for facies models. We benchmark seven different formulations, including VAE, generative adversarial network (GAN), Wasserstein GAN, variational auto-encoding GAN, principal component analysis (PCA) with cycle GAN, PCA with transfer style network and VAE with style loss.
arXiv Detail & Related papers (2020-05-08T21:32:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.