Related papers: KAN: Kolmogorov-Arnold Networks

KAN: Kolmogorov-Arnold Networks

URL: http://arxiv.org/abs/2404.19756v4
Date: Sun, 16 Jun 2024 13:34:56 GMT
Title: KAN: Kolmogorov-Arnold Networks
Authors: Ziming Liu, Yixuan Wang, Sachin Vaidya, Fabian Ruehle, James Halverson, Marin Soljačić, Thomas Y. Hou, Max Tegmark,
Abstract summary: We propose Kolmogorov-Arnold Networks (KANs) as promising alternatives to Multi-Layer Perceptrons (MLPs) KANs have learnable activation functions on edges ("weights") We show that this seemingly simple change makes KANs outperform neurals in terms of accuracy and interpretability.
Score: 16.782018138008578
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Inspired by the Kolmogorov-Arnold representation theorem, we propose Kolmogorov-Arnold Networks (KANs) as promising alternatives to Multi-Layer Perceptrons (MLPs). While MLPs have fixed activation functions on nodes ("neurons"), KANs have learnable activation functions on edges ("weights"). KANs have no linear weights at all -- every weight parameter is replaced by a univariate function parametrized as a spline. We show that this seemingly simple change makes KANs outperform MLPs in terms of accuracy and interpretability. For accuracy, much smaller KANs can achieve comparable or better accuracy than much larger MLPs in data fitting and PDE solving. Theoretically and empirically, KANs possess faster neural scaling laws than MLPs. For interpretability, KANs can be intuitively visualized and can easily interact with human users. Through two examples in mathematics and physics, KANs are shown to be useful collaborators helping scientists (re)discover mathematical and physical laws. In summary, KANs are promising alternatives for MLPs, opening opportunities for further improving today's deep learning models which rely heavily on MLPs.

Related papers

Improving Memory Efficiency for Training KANs via Meta Learning [55.24089119864207]
We propose to generate weights for KANs via a smaller meta-learner, called MetaKANs.<n>By training KANs and MetaKANs in an end-to-end differentiable manner, MetaKANs achieve comparable or even superior performance.
arXiv Detail & Related papers (2025-06-09T08:38:26Z)
Heuristic Methods are Good Teachers to Distill MLPs for Graph Link Prediction [61.70012924088756]
Distilling Graph Neural Networks (GNNs) teachers into Multi-Layer Perceptrons (MLPs) students has emerged as an effective approach to achieve strong performance. However, existing distillation methods only use standard GNNs and overlook alternative teachers such as specialized model for link prediction (GNN4LP) and methods (e.g., common neighbors) This paper first explores the impact of different teachers in GNN-to-MLP distillation, we find that stronger teachers do not always produce stronger students, while weaker methods can teachs to near-GNN performance with drastically reduced training costs
arXiv Detail & Related papers (2025-04-08T16:35:11Z)
PRKAN: Parameter-Reduced Kolmogorov-Arnold Networks [47.947045173329315]
Kolmogorov-Arnold Networks (KANs) represent an innovation in neural network architectures. KANs offer a compelling alternative to Multi-Layer Perceptrons (MLPs) in models such as CNNs, RecurrentReduced Networks (RNNs) and Transformers. This paper introduces PRKANs, which employ several methods to reduce the parameter count in layers, making them comparable to Neural M layers.
arXiv Detail & Related papers (2025-01-13T03:07:39Z)
PowerMLP: An Efficient Version of KAN [10.411788782126091]
The Kolmogorov-Arnold Network (KAN) is a new network architecture known for its high accuracy in several tasks such as function fitting and PDE solving. The superior computation capability of KAN arises from the Kolmogorov-Arnold representation and learnable spline functions. PowerMLP achieves higher accuracy and a training speed about 40 times faster than KAN in various tasks.
arXiv Detail & Related papers (2024-12-18T07:42:34Z)
On the expressiveness and spectral bias of KANs [17.42614039265962]
KANs were recently proposed as a potential alternative to the prevalent architectural backbone of many deep learning models, the multi-layer perceptron (MLP) KANs have seen success in various tasks of AI for science, with their empirical efficiency and accuracy demostrated in function regression, PDE solving, and many more scientific problems.
arXiv Detail & Related papers (2024-10-02T17:57:38Z)
Incorporating Arbitrary Matrix Group Equivariance into KANs [69.30866522377694]
Kolmogorov-Arnold Networks (KANs) have seen great success in scientific domains. However, spline functions may not respect symmetry in tasks, which is crucial prior knowledge in machine learning. We propose Equivariant Kolmogorov-Arnold Networks (EKAN) to broaden their applicability to more fields.
arXiv Detail & Related papers (2024-10-01T06:34:58Z)
A preliminary study on continual learning in computer vision using Kolmogorov-Arnold Networks [43.70716358136333]
Kolmogorov- Networks (KAN) are based on a fundamentally different mathematical framework. KANs address several major issues insio, such as forgetting in continual learning scenarios. We extend the investigation by evaluating the performance of KANs in continual learning tasks within computer vision.
arXiv Detail & Related papers (2024-09-20T14:49:21Z)
KAN v.s. MLP for Offline Reinforcement Learning [4.3621896506713185]
Kolmogorov-Arnold Networks (KAN) is an emerging neural network architecture in machine learning. In this paper, we explore the incorporation of KAN into the actor and critic networks for offline reinforcement learning.
arXiv Detail & Related papers (2024-09-15T07:52:44Z)
Kolmogorov-Arnold Network for Online Reinforcement Learning [0.22615818641180724]
Kolmogorov-Arnold Networks (KANs) have shown potential as an alternative to Multi-Layer Perceptrons (MLPs) in neural networks. KANs provide universal function approximation with fewer parameters and reduced memory usage.
arXiv Detail & Related papers (2024-08-09T03:32:37Z)
KAN or MLP: A Fairer Comparison [63.794304207664176]
This paper offers a fairer and more comprehensive comparison of KAN and models across various tasks. We control the number of parameters and FLOPs to compare the performance of KAN and representation. We find that KAN's issue is more severe than that of forgetting in a standard class-incremental continual learning setting.
arXiv Detail & Related papers (2024-07-23T17:43:35Z)
KAGNNs: Kolmogorov-Arnold Networks meet Graph Learning [27.638009679134523]
Graph Neural Networks (GNNs) have become the de facto tool for learning node and graph representations. We implement three new KAN-based GNN layers inspired respectively by the GCN, GAT and GIN layers. Our results indicate that KANs are on-par with or better than Arnolds on all tasks studied in this paper.
arXiv Detail & Related papers (2024-06-26T14:21:21Z)
ReLU Fields: The Little Non-linearity That Could [62.228229880658404]
We investigate what is the smallest change to grid-based representations that allows for retaining the high fidelity result ofs. We show that such an approach becomes competitive with the state-of-the-art.
arXiv Detail & Related papers (2022-05-22T13:42:31Z)
Efficient Language Modeling with Sparse all-MLP [53.81435968051093]
All-MLPs can match Transformers in language modeling, but still lag behind in downstream tasks. We propose sparse all-MLPs with mixture-of-experts (MoEs) in both feature and input (tokens) We evaluate its zero-shot in-context learning performance on six downstream tasks, and find that it surpasses Transformer-based MoEs and dense Transformers.
arXiv Detail & Related papers (2022-03-14T04:32:19Z)
On Graph Neural Networks versus Graph-Augmented MLPs [51.23890789522705]
Graph-Augmented Multi-Layer Perceptrons (GA-MLPs) first augments node features with certain multi-hop operators on the graph. We prove a separation in expressive power between GA-MLPs and GNNs that grows exponentially in depth.
arXiv Detail & Related papers (2020-10-28T17:59:59Z)
How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks [80.55378250013496]
We study how neural networks trained by gradient descent extrapolate what they learn outside the support of the training distribution. Graph Neural Networks (GNNs) have shown some success in more complex tasks.
arXiv Detail & Related papers (2020-09-24T17:48:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.