LeanKAN: A Parameter-Lean Kolmogorov-Arnold Network Layer with Improved Memory Efficiency and Convergence Behavior
- URL: http://arxiv.org/abs/2502.17844v1
- Date: Tue, 25 Feb 2025 04:43:41 GMT
- Title: LeanKAN: A Parameter-Lean Kolmogorov-Arnold Network Layer with Improved Memory Efficiency and Convergence Behavior
- Authors: Benjamin C. Koenig, Suyong Kim, Sili Deng,
- Abstract summary: The Kolmogorov-Arnold network (KAN) is a promising alternative to multi-layer perceptrons (MLPs) for data-driven modeling.<n>Here, we find that MultKAN layers suffer from limited applicability in output layers.<n>We propose LeanKANs, a direct and modular replacement for MultKAN and traditional AddKAN layers.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The recently proposed Kolmogorov-Arnold network (KAN) is a promising alternative to multi-layer perceptrons (MLPs) for data-driven modeling. While original KAN layers were only capable of representing the addition operator, the recently-proposed MultKAN layer combines addition and multiplication subnodes in an effort to improve representation performance. Here, we find that MultKAN layers suffer from a few key drawbacks including limited applicability in output layers, bulky parameterizations with extraneous activations, and the inclusion of complex hyperparameters. To address these issues, we propose LeanKANs, a direct and modular replacement for MultKAN and traditional AddKAN layers. LeanKANs address these three drawbacks of MultKAN through general applicability as output layers, significantly reduced parameter counts for a given network structure, and a smaller set of hyperparameters. As a one-to-one layer replacement for standard AddKAN and MultKAN layers, LeanKAN is able to provide these benefits to traditional KAN learning problems as well as augmented KAN structures in which it serves as the backbone, such as KAN Ordinary Differential Equations (KAN-ODEs) or Deep Operator KANs (DeepOKAN). We demonstrate LeanKAN's simplicity and efficiency in a series of demonstrations carried out across both a standard KAN toy problem and a KAN-ODE dynamical system modeling problem, where we find that its sparser parameterization and compact structure serve to increase its expressivity and learning capability, leading it to outperform similar and even much larger MultKANs in various tasks.
Related papers
- asKAN: Active Subspace embedded Kolmogorov-Arnold Network [2.408451825799214]
The Kolmogorov-Arnold Network (KAN) has emerged as a promising neural network architecture for small-scale AI+Science applications.
This study investigates this inflexibility through the lens of the Kolmogorov-Arnold theorem.
We propose active subspace embedded KAN, a hierarchical framework that synergizes KAN's function representation with active subspace methodology.
arXiv Detail & Related papers (2025-04-07T01:43:13Z) - AF-KAN: Activation Function-Based Kolmogorov-Arnold Networks for Efficient Representation Learning [4.843466576537832]
Kolmogorov-Arnold Networks (KANs) have inspired numerous works exploring their applications across a wide range of scientific problems.
We introduce Activation Function-Based Kolmogorov-Arnold Networks (AF-KAN), expanding ReLU-KAN with various activations and their function combinations.
This novel KAN also incorporates parameter reduction methods, primarily attention mechanisms and data normalization, to enhance performance on image classification datasets.
arXiv Detail & Related papers (2025-03-08T07:38:51Z) - PRKAN: Parameter-Reduced Kolmogorov-Arnold Networks [47.947045173329315]
Kolmogorov-Arnold Networks (KANs) represent an innovation in neural network architectures.<n>KANs offer a compelling alternative to Multi-Layer Perceptrons (MLPs) in models such as CNNs, RecurrentReduced Networks (RNNs) and Transformers.<n>This paper introduces PRKANs, which employ several methods to reduce the parameter count in layers, making them comparable to Neural M layers.
arXiv Detail & Related papers (2025-01-13T03:07:39Z) - Incorporating Arbitrary Matrix Group Equivariance into KANs [69.30866522377694]
We propose Equivariant Kolmogorov-Arnold Networks (EKAN), a method for incorporating arbitrary matrix group equivariants into KANs.<n>EKAN achieves higher accuracy with smaller datasets or fewer parameters on symmetry-related tasks, such as particle scattering and the three-body problem.
arXiv Detail & Related papers (2024-10-01T06:34:58Z) - Kolmogorov-Arnold Transformer [72.88137795439407]
We introduce the Kolmogorov-Arnold Transformer (KAT), a novel architecture that replaces layers with Kolmogorov-Arnold Network (KAN) layers.
We identify three key challenges: (C1) Base function, (C2) Inefficiency, and (C3) Weight.
With these designs, KAT outperforms traditional-based transformers.
arXiv Detail & Related papers (2024-09-16T17:54:51Z) - Activation Space Selectable Kolmogorov-Arnold Networks [29.450377034478933]
Kolmogorov-Arnold Network (KAN), based on nonlinear additive connections, has been proven to achieve performance comparable to Select-based methods.
Despite this potential, the use of a single activation function space results in reduced performance of KAN and related works across different tasks.
This work contributes to the understanding of the data-centric design of new AI and provides a foundational reference for innovations in KAN-based network architectures.
arXiv Detail & Related papers (2024-08-15T11:34:05Z) - KAN we improve on HEP classification tasks? Kolmogorov-Arnold Networks applied to an LHC physics example [0.08192907805418582]
Kolmogorov-Arnold Networks (KANs) have been proposed as an alternative to multilayer perceptrons.
We study a typical binary event classification task in high-energy physics.
We find that the learned activation functions of a one-layer KAN resemble the log-likelihood ratio of the input features.
arXiv Detail & Related papers (2024-08-05T18:01:07Z) - Explaining Modern Gated-Linear RNNs via a Unified Implicit Attention Formulation [54.50526986788175]
Recent advances in efficient sequence modeling have led to attention-free layers, such as Mamba, RWKV, and various gated RNNs.
We present a unified view of these models, formulating such layers as implicit causal self-attention layers.
Our framework compares the underlying mechanisms on similar grounds for different layers and provides a direct means for applying explainability methods.
arXiv Detail & Related papers (2024-05-26T09:57:45Z) - Scaling Pre-trained Language Models to Deeper via Parameter-efficient
Architecture [68.13678918660872]
We design a more capable parameter-sharing architecture based on matrix product operator (MPO)
MPO decomposition can reorganize and factorize the information of a parameter matrix into two parts.
Our architecture shares the central tensor across all layers for reducing the model size.
arXiv Detail & Related papers (2023-03-27T02:34:09Z) - ACDC: Weight Sharing in Atom-Coefficient Decomposed Convolution [57.635467829558664]
We introduce a structural regularization across convolutional kernels in a CNN.
We show that CNNs now maintain performance with dramatic reduction in parameters and computations.
arXiv Detail & Related papers (2020-09-04T20:41:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.