Related papers: Kolmogorov-Arnold Networks in Low-Data Regimes: A Comparative Study with Multilayer Perceptrons

Kolmogorov-Arnold Networks in Low-Data Regimes: A Comparative Study with Multilayer Perceptrons

URL: http://arxiv.org/abs/2409.10463v1
Date: Mon, 16 Sep 2024 16:56:08 GMT
Title: Kolmogorov-Arnold Networks in Low-Data Regimes: A Comparative Study with Multilayer Perceptrons
Authors: Farhad Pourkamali-Anaraki,
Abstract summary: Kolmogorov-Arnold Networks (KANs) use highly flexible learnable activation functions directly on network edges. KANs significantly increase the number of learnable parameters, raising concerns about their effectiveness in data-scarce environments. We show that individualized activation functions achieve significantly higher predictive accuracy with only a modest increase in parameters.
Score: 2.77390041716769
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multilayer Perceptrons (MLPs) have long been a cornerstone in deep learning, known for their capacity to model complex relationships. Recently, Kolmogorov-Arnold Networks (KANs) have emerged as a compelling alternative, utilizing highly flexible learnable activation functions directly on network edges, a departure from the neuron-centric approach of MLPs. However, KANs significantly increase the number of learnable parameters, raising concerns about their effectiveness in data-scarce environments. This paper presents a comprehensive comparative study of MLPs and KANs from both algorithmic and experimental perspectives, with a focus on low-data regimes. We introduce an effective technique for designing MLPs with unique, parameterized activation functions for each neuron, enabling a more balanced comparison with KANs. Using empirical evaluations on simulated data and two real-world data sets from medicine and engineering, we explore the trade-offs between model complexity and accuracy, with particular attention to the role of network depth. Our findings show that MLPs with individualized activation functions achieve significantly higher predictive accuracy with only a modest increase in parameters, especially when the sample size is limited to around one hundred. For example, in a three-class classification problem within additive manufacturing, MLPs achieve a median accuracy of 0.91, significantly outperforming KANs, which only reach a median accuracy of 0.53 with default hyperparameters. These results offer valuable insights into the impact of activation function selection in neural networks.

Related papers

Kolmogorov Arnold Networks and Multi-Layer Perceptrons: A Paradigm Shift in Neural Modelling [1.6998720690708842]
The research undertakes a comprehensive comparative analysis of Kolmogorov-Arnold Networks (KAN) and Multi-Layer Perceptrons (MLP)<n>KANs utilize spline-based activation functions and grid-based structures, providing a transformative approach compared to traditional neural network frameworks.<n>The proposed study highlights the transformative capabilities of KANs in progressing intelligent systems.
arXiv Detail & Related papers (2026-01-15T16:26:49Z)
Scientific Machine Learning with Kolmogorov-Arnold Networks [0.0]
The field of scientific machine learning is increasingly adopting Kolmogorov-Arnold Networks (KANs) for data encoding.<n>This review categorizes recent progress in KAN-based models across three distinct perspectives: (i) data-driven learning, (ii) physics-informed modeling, and (iii) deep operator learning.<n>We highlight consistent improvements in accuracy, convergence, and spectral representation, clarifying KANs' advantages in capturing complex dynamics while learning more effectively.
arXiv Detail & Related papers (2025-07-30T01:26:44Z)
Personalized Control for Lower Limb Prosthesis Using Kolmogorov-Arnold Networks [0.0]
This paper investigates the potential of learnable activation functions in Kolmogorov-Arnold Networks (KANs) for personalized control in a lower-limb prosthesis.<n>In addition, user-specific vs. pooled training data is evaluated to improve machine learning (ML) and Deep Learning (DL) performance for turn intent prediction.
arXiv Detail & Related papers (2025-05-14T13:18:57Z)
Enhancing Federated Learning with Kolmogorov-Arnold Networks: A Comparative Study Across Diverse Aggregation Strategies [0.24578723416255752]
Kolmogorov-Arnold Networks (KAN) have shown promising capabilities in modeling complex nonlinear relationships.<n>KANs consistently outperform Multilayer Perceptrons in terms of accuracy, stability, and convergence efficiency.
arXiv Detail & Related papers (2025-05-12T14:56:27Z)
Learning Massive-scale Partial Correlation Networks in Clinical Multi-omics Studies with HP-ACCORD [10.459304300065186]
We introduce a novel pseudolikelihood-based graphical model framework. It maintains estimation and selection consistency in various metrics under high-dimensional assumptions. A high-performance computing implementation of our framework was tested in simulated data with up to one million variables.
arXiv Detail & Related papers (2024-12-16T08:38:02Z)
A preliminary study on continual learning in computer vision using Kolmogorov-Arnold Networks [43.70716358136333]
Kolmogorov- Networks (KAN) are based on a fundamentally different mathematical framework. KANs address several major issues insio, such as forgetting in continual learning scenarios. We extend the investigation by evaluating the performance of KANs in continual learning tasks within computer vision.
arXiv Detail & Related papers (2024-09-20T14:49:21Z)
Discovering Long-Term Effects on Parameter Efficient Fine-tuning [36.83255498301937]
Pre-trained Artificial Neural Networks (Annns) exhibit robust pattern recognition capabilities. Annns and BNNs share extensive similarities with the human brain, specifically Biological Neural Networks (BNNs) Annns can acquire new knowledge through fine-tuning.
arXiv Detail & Related papers (2024-08-24T03:27:29Z)
Kolmogorov-Arnold Network for Online Reinforcement Learning [0.22615818641180724]
Kolmogorov-Arnold Networks (KANs) have shown potential as an alternative to Multi-Layer Perceptrons (MLPs) in neural networks. KANs provide universal function approximation with fewer parameters and reduced memory usage.
arXiv Detail & Related papers (2024-08-09T03:32:37Z)
Multi-Epoch learning with Data Augmentation for Deep Click-Through Rate Prediction [53.88231294380083]
We introduce a novel Multi-Epoch learning with Data Augmentation (MEDA) framework, suitable for both non-continual and continual learning scenarios. MEDA minimizes overfitting by reducing the dependency of the embedding layer on subsequent training data. Our findings confirm that pre-trained layers can adapt to new embedding spaces, enhancing performance without overfitting.
arXiv Detail & Related papers (2024-06-27T04:00:15Z)
Let's Focus on Neuron: Neuron-Level Supervised Fine-tuning for Large Language Model [43.107778640669544]
Large Language Models (LLMs) are composed of neurons that exhibit various behaviors and roles. Recent studies have revealed that not all neurons are active across different datasets. We introduce Neuron-Level Fine-Tuning (NeFT), a novel approach that refines the granularity of parameter training down to the individual neuron.
arXiv Detail & Related papers (2024-03-18T09:55:01Z)
The effect of data augmentation and 3D-CNN depth on Alzheimer's Disease detection [51.697248252191265]
This work summarizes and strictly observes best practices regarding data handling, experimental design, and model evaluation. We focus on Alzheimer's Disease (AD) detection, which serves as a paradigmatic example of challenging problem in healthcare. Within this framework, we train predictive 15 models, considering three different data augmentation strategies and five distinct 3D CNN architectures.
arXiv Detail & Related papers (2023-09-13T10:40:41Z)
Robust Learning with Progressive Data Expansion Against Spurious Correlation [65.83104529677234]
We study the learning process of a two-layer nonlinear convolutional neural network in the presence of spurious features. Our analysis suggests that imbalanced data groups and easily learnable spurious features can lead to the dominance of spurious features during the learning process. We propose a new training algorithm called PDE that efficiently enhances the model's robustness for a better worst-group performance.
arXiv Detail & Related papers (2023-06-08T05:44:06Z)
Relational Neural Markov Random Fields [29.43155380361715]
We introduce Markov Random Fields (RN-MRFs) which allow handling of complex hybrid domains. We propose a maximum pseudolikelihood estimation-based learning algorithm with importance for training the potential parameters.
arXiv Detail & Related papers (2021-10-18T22:52:54Z)
MoEfication: Conditional Computation of Transformer Models for Efficient Inference [66.56994436947441]
Transformer-based pre-trained language models can achieve superior performance on most NLP tasks due to large parameter capacity, but also lead to huge computation cost. We explore to accelerate large-model inference by conditional computation based on the sparse activation phenomenon. We propose to transform a large model into its mixture-of-experts (MoE) version with equal model size, namely MoEfication.
arXiv Detail & Related papers (2021-10-05T02:14:38Z)
Learning Neural Causal Models with Active Interventions [83.44636110899742]
We introduce an active intervention-targeting mechanism which enables a quick identification of the underlying causal structure of the data-generating process. Our method significantly reduces the required number of interactions compared with random intervention targeting. We demonstrate superior performance on multiple benchmarks from simulated to real-world data.
arXiv Detail & Related papers (2021-09-06T13:10:37Z)
An Investigation of Why Overparameterization Exacerbates Spurious Correlations [98.3066727301239]
We identify two key properties of the training data that drive this behavior. We show how the inductive bias of models towards "memorizing" fewer examples can cause over parameterization to hurt.
arXiv Detail & Related papers (2020-05-09T01:59:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.