PowerMLP: An Efficient Version of KAN
- URL: http://arxiv.org/abs/2412.13571v1
- Date: Wed, 18 Dec 2024 07:42:34 GMT
- Title: PowerMLP: An Efficient Version of KAN
- Authors: Ruichen Qiu, Yibo Miao, Shiwen Wang, Lijia Yu, Yifan Zhu, Xiao-Shan Gao,
- Abstract summary: The Kolmogorov-Arnold Network (KAN) is a new network architecture known for its high accuracy in several tasks such as function fitting and PDE solving.
The superior computation capability of KAN arises from the Kolmogorov-Arnold representation and learnable spline functions.
PowerMLP achieves higher accuracy and a training speed about 40 times faster than KAN in various tasks.
- Score: 10.411788782126091
- License:
- Abstract: The Kolmogorov-Arnold Network (KAN) is a new network architecture known for its high accuracy in several tasks such as function fitting and PDE solving. The superior expressive capability of KAN arises from the Kolmogorov-Arnold representation theorem and learnable spline functions. However, the computation of spline functions involves multiple iterations, which renders KAN significantly slower than MLP, thereby increasing the cost associated with model training and deployment. The authors of KAN have also noted that ``the biggest bottleneck of KANs lies in its slow training. KANs are usually 10x slower than MLPs, given the same number of parameters.'' To address this issue, we propose a novel MLP-type neural network PowerMLP that employs simpler non-iterative spline function representation, offering approximately the same training time as MLP while theoretically demonstrating stronger expressive power than KAN. Furthermore, we compare the FLOPs of KAN and PowerMLP, quantifying the faster computation speed of PowerMLP. Our comprehensive experiments demonstrate that PowerMLP generally achieves higher accuracy and a training speed about 40 times faster than KAN in various tasks.
Related papers
- OP-LoRA: The Blessing of Dimensionality [93.08208871549557]
Low-rank adapters enable fine-tuning of large models with only a small number of parameters.
They often pose optimization challenges, with poor convergence.
We introduce an over- parameterized approach that accelerates training without increasing inference costs.
We achieve improvements in vision-language tasks and especially notable increases in image generation.
arXiv Detail & Related papers (2024-12-13T18:55:19Z) - On the expressiveness and spectral bias of KANs [17.42614039265962]
KANs were recently proposed as a potential alternative to the prevalent architectural backbone of many deep learning models, the multi-layer perceptron (MLP)
KANs have seen success in various tasks of AI for science, with their empirical efficiency and accuracy demostrated in function regression, PDE solving, and many more scientific problems.
arXiv Detail & Related papers (2024-10-02T17:57:38Z) - Kolmogorov-Arnold Transformer [72.88137795439407]
We introduce the Kolmogorov-Arnold Transformer (KAT), a novel architecture that replaces layers with Kolmogorov-Arnold Network (KAN) layers.
We identify three key challenges: (C1) Base function, (C2) Inefficiency, and (C3) Weight.
With these designs, KAT outperforms traditional-based transformers.
arXiv Detail & Related papers (2024-09-16T17:54:51Z) - KAN v.s. MLP for Offline Reinforcement Learning [4.3621896506713185]
Kolmogorov-Arnold Networks (KAN) is an emerging neural network architecture in machine learning.
In this paper, we explore the incorporation of KAN into the actor and critic networks for offline reinforcement learning.
arXiv Detail & Related papers (2024-09-15T07:52:44Z) - SA-MLP: A Low-Power Multiplication-Free Deep Network for 3D Point Cloud Classification in Resource-Constrained Environments [46.266960248570086]
Point cloud classification plays a crucial role in the processing and analysis of data from 3D sensors such as LiDAR.
Traditional neural networks, which rely heavily on multiplication operations, often face challenges in terms of high computational costs and energy consumption.
This study presents a novel family of efficient multiplication-based architectures designed to improve the computational efficiency of point cloud classification tasks.
arXiv Detail & Related papers (2024-09-03T15:43:44Z) - Kolmogorov-Arnold Network for Online Reinforcement Learning [0.22615818641180724]
Kolmogorov-Arnold Networks (KANs) have shown potential as an alternative to Multi-Layer Perceptrons (MLPs) in neural networks.
KANs provide universal function approximation with fewer parameters and reduced memory usage.
arXiv Detail & Related papers (2024-08-09T03:32:37Z) - KAN or MLP: A Fairer Comparison [63.794304207664176]
This paper offers a fairer and more comprehensive comparison of KAN and models across various tasks.
We control the number of parameters and FLOPs to compare the performance of KAN and representation.
We find that KAN's issue is more severe than that of forgetting in a standard class-incremental continual learning setting.
arXiv Detail & Related papers (2024-07-23T17:43:35Z) - KAN: Kolmogorov-Arnold Networks [16.782018138008578]
We propose Kolmogorov-Arnold Networks (KANs) as promising alternatives to Multi-Layer Perceptrons (MLPs)
KANs have learnable activation functions on edges ("weights")
We show that this seemingly simple change makes KANs outperform neurals in terms of accuracy and interpretability.
arXiv Detail & Related papers (2024-04-30T17:58:29Z) - Parameter and Computation Efficient Transfer Learning for
Vision-Language Pre-trained Models [79.34513906324727]
In this paper, we aim at parameter and efficient transfer learning (PCETL) for vision-language pre-trained models.
We propose a novel dynamic architecture skipping (DAS) approach towards effective PCETL.
arXiv Detail & Related papers (2023-09-04T09:34:33Z) - Efficient Language Modeling with Sparse all-MLP [53.81435968051093]
All-MLPs can match Transformers in language modeling, but still lag behind in downstream tasks.
We propose sparse all-MLPs with mixture-of-experts (MoEs) in both feature and input (tokens)
We evaluate its zero-shot in-context learning performance on six downstream tasks, and find that it surpasses Transformer-based MoEs and dense Transformers.
arXiv Detail & Related papers (2022-03-14T04:32:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.