Improving Memory Efficiency for Training KANs via Meta Learning
- URL: http://arxiv.org/abs/2506.07549v1
- Date: Mon, 09 Jun 2025 08:38:26 GMT
- Title: Improving Memory Efficiency for Training KANs via Meta Learning
- Authors: Zhangchi Zhao, Jun Shu, Deyu Meng, Zongben Xu,
- Abstract summary: We propose to generate weights for KANs via a smaller meta-learner, called MetaKANs.<n>By training KANs and MetaKANs in an end-to-end differentiable manner, MetaKANs achieve comparable or even superior performance.
- Score: 55.24089119864207
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Inspired by the Kolmogorov-Arnold representation theorem, KANs offer a novel framework for function approximation by replacing traditional neural network weights with learnable univariate functions. This design demonstrates significant potential as an efficient and interpretable alternative to traditional MLPs. However, KANs are characterized by a substantially larger number of trainable parameters, leading to challenges in memory efficiency and higher training costs compared to MLPs. To address this limitation, we propose to generate weights for KANs via a smaller meta-learner, called MetaKANs. By training KANs and MetaKANs in an end-to-end differentiable manner, MetaKANs achieve comparable or even superior performance while significantly reducing the number of trainable parameters and maintaining promising interpretability. Extensive experiments on diverse benchmark tasks, including symbolic regression, partial differential equation solving, and image classification, demonstrate the effectiveness of MetaKANs in improving parameter efficiency and memory usage. The proposed method provides an alternative technique for training KANs, that allows for greater scalability and extensibility, and narrows the training cost gap with MLPs stated in the original paper of KANs. Our code is available at https://github.com/Murphyzc/MetaKAN.
Related papers
- EKPC: Elastic Knowledge Preservation and Compensation for Class-Incremental Learning [53.88000987041739]
Class-Incremental Learning (CIL) aims to enable AI models to continuously learn from sequentially arriving data of different classes over time.<n>We propose the Elastic Knowledge Preservation and Compensation (EKPC) method, integrating Importance-aware importance Regularization (IPR) and Trainable Semantic Drift Compensation (TSDC) for CIL.
arXiv Detail & Related papers (2025-06-14T05:19:58Z) - Low Tensor-Rank Adaptation of Kolmogorov--Arnold Networks [70.06682043272377]
Kolmogorov--Arnold networks (KANs) have demonstrated their potential as an alternative to multi-layer perceptions (MLPs) in various domains.<n>We develop low tensor-rank adaptation (LoTRA) for fine-tuning KANs.<n>We explore the application of LoTRA for efficiently solving various partial differential equations (PDEs) by fine-tuning KANs.
arXiv Detail & Related papers (2025-02-10T04:57:07Z) - Free-Knots Kolmogorov-Arnold Network: On the Analysis of Spline Knots and Advancing Stability [16.957071012748454]
Kolmogorov-Arnold Neural Networks (KANs) have gained significant attention in the machine learning community.<n>However, their implementation often suffers from poor training stability and heavy trainable parameter.<n>In this work, we analyze the behavior of KANs through the lens of spline knots and derive the lower and upper bound for the number of knots in B-spline-based KANs.
arXiv Detail & Related papers (2025-01-16T04:12:05Z) - A preliminary study on continual learning in computer vision using Kolmogorov-Arnold Networks [43.70716358136333]
Kolmogorov- Networks (KAN) are based on a fundamentally different mathematical framework.
KANs address several major issues insio, such as forgetting in continual learning scenarios.
We extend the investigation by evaluating the performance of KANs in continual learning tasks within computer vision.
arXiv Detail & Related papers (2024-09-20T14:49:21Z) - KAN v.s. MLP for Offline Reinforcement Learning [4.3621896506713185]
Kolmogorov-Arnold Networks (KAN) is an emerging neural network architecture in machine learning.
In this paper, we explore the incorporation of KAN into the actor and critic networks for offline reinforcement learning.
arXiv Detail & Related papers (2024-09-15T07:52:44Z) - Activation Space Selectable Kolmogorov-Arnold Networks [29.450377034478933]
Kolmogorov-Arnold Network (KAN), based on nonlinear additive connections, has been proven to achieve performance comparable to Select-based methods.
Despite this potential, the use of a single activation function space results in reduced performance of KAN and related works across different tasks.
This work contributes to the understanding of the data-centric design of new AI and provides a foundational reference for innovations in KAN-based network architectures.
arXiv Detail & Related papers (2024-08-15T11:34:05Z) - Kolmogorov-Arnold Network for Online Reinforcement Learning [0.22615818641180724]
Kolmogorov-Arnold Networks (KANs) have shown potential as an alternative to Multi-Layer Perceptrons (MLPs) in neural networks.
KANs provide universal function approximation with fewer parameters and reduced memory usage.
arXiv Detail & Related papers (2024-08-09T03:32:37Z) - SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios.
In the early route, intermediate outputs are consolidated via an anti-redundancy operation.
In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z) - Boosting Inference Efficiency: Unleashing the Power of Parameter-Shared
Pre-trained Language Models [109.06052781040916]
We introduce a technique to enhance the inference efficiency of parameter-shared language models.
We also propose a simple pre-training technique that leads to fully or partially shared models.
Results demonstrate the effectiveness of our methods on both autoregressive and autoencoding PLMs.
arXiv Detail & Related papers (2023-10-19T15:13:58Z) - Towards a Unified View of Parameter-Efficient Transfer Learning [108.94786930869473]
Fine-tuning large pre-trained language models on downstream tasks has become the de-facto learning paradigm in NLP.
Recent work has proposed a variety of parameter-efficient transfer learning methods that only fine-tune a small number of (extra) parameters to attain strong performance.
We break down the design of state-of-the-art parameter-efficient transfer learning methods and present a unified framework that establishes connections between them.
arXiv Detail & Related papers (2021-10-08T20:22:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.