Kolmogorov-Arnold Networks: Approximation and Learning Guarantees for Functions and their Derivatives
- URL: http://arxiv.org/abs/2504.15110v1
- Date: Mon, 21 Apr 2025 14:02:59 GMT
- Title: Kolmogorov-Arnold Networks: Approximation and Learning Guarantees for Functions and their Derivatives
- Authors: Anastasis Kratsios, Takashi Furuya,
- Abstract summary: Kolmogorov-Arnold Networks (KANs) have emerged as an improved backbone for most deep learning frameworks.<n>We show that KANs can optimally approximate any Besov function in $Bs_p,q(mathcalX)$ on a bounded open, or even fractal, domain.<n>We complement our approximation guarantee with a dimension-free estimate on the sample complexity of a residual KAN model.
- Score: 8.517406772939292
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Inspired by the Kolmogorov-Arnold superposition theorem, Kolmogorov-Arnold Networks (KANs) have recently emerged as an improved backbone for most deep learning frameworks, promising more adaptivity than their multilayer perception (MLP) predecessor by allowing for trainable spline-based activation functions. In this paper, we probe the theoretical foundations of the KAN architecture by showing that it can optimally approximate any Besov function in $B^{s}_{p,q}(\mathcal{X})$ on a bounded open, or even fractal, domain $\mathcal{X}$ in $\mathbb{R}^d$ at the optimal approximation rate with respect to any weaker Besov norm $B^{\alpha}_{p,q}(\mathcal{X})$; where $\alpha < s$. We complement our approximation guarantee with a dimension-free estimate on the sample complexity of a residual KAN model when learning a function of Besov regularity from $N$ i.i.d. noiseless samples. Our KAN architecture incorporates contemporary deep learning wisdom by leveraging residual/skip connections between layers.
Related papers
- Theory-to-Practice Gap for Neural Networks and Neural Operators [6.267574471145217]
We study the sampling complexity of learning with ReLU neural networks and neural operators.<n>We show that the best-possible convergence rate in a Bochner $Lp$-norm is bounded by Monte-Carlo rates of order $1/p$.
arXiv Detail & Related papers (2025-03-23T21:45:58Z) - Enhanced Feature Learning via Regularisation: Integrating Neural Networks and Kernel Methods [0.0]
We consider functions as expectations of Sobolev functions over all possible one-dimensional projections of the data.
This framework is similar to kernel ridge regression, where the kernel is $mathbbE_w ( k(B)(wtop x,wtop xprime))$, with $k(B)(a,b) := min(|a|, |b|)mathds1_ab>0$ the Brownian kernel, and the distribution of the projections $w$ is learnt
arXiv Detail & Related papers (2024-07-24T13:46:50Z) - Improved Regret for Efficient Online Reinforcement Learning with Linear
Function Approximation [69.0695698566235]
We study reinforcement learning with linear function approximation and adversarially changing cost functions.
We present a computationally efficient policy optimization algorithm for the challenging general setting of unknown dynamics and bandit feedback.
arXiv Detail & Related papers (2023-01-30T17:26:39Z) - Provably Efficient Offline Reinforcement Learning with Trajectory-Wise
Reward [66.81579829897392]
We propose a novel offline reinforcement learning algorithm called Pessimistic vAlue iteRaTion with rEward Decomposition (PARTED)
PARTED decomposes the trajectory return into per-step proxy rewards via least-squares-based reward redistribution, and then performs pessimistic value based on the learned proxy reward.
To the best of our knowledge, PARTED is the first offline RL algorithm that is provably efficient in general MDP with trajectory-wise reward.
arXiv Detail & Related papers (2022-06-13T19:11:22Z) - DASHA: Distributed Nonconvex Optimization with Communication
Compression, Optimal Oracle Complexity, and No Client Synchronization [77.34726150561087]
We develop and analyze DASHA: a new family of methods for noneps distributed optimization problems.
Unlike MARINA, the new methods DASHA, DASHA-MVR send compressed vectors only and never synchronize the nodes, which makes them more practical for learning.
arXiv Detail & Related papers (2022-02-02T20:10:40Z) - Universal Approximation Property of Neural Ordinary Differential
Equations [19.861764482790544]
We show that NODEs can form an $Lp$-universal approximator for continuous maps under certain conditions.
We also show their stronger approximation property, namely the $sup$-universality for approximating a large class of diffeomorphisms.
arXiv Detail & Related papers (2020-12-04T05:53:21Z) - On Function Approximation in Reinforcement Learning: Optimism in the
Face of Large State Spaces [208.67848059021915]
We study the exploration-exploitation tradeoff at the core of reinforcement learning.
In particular, we prove that the complexity of the function class $mathcalF$ characterizes the complexity of the function.
Our regret bounds are independent of the number of episodes.
arXiv Detail & Related papers (2020-11-09T18:32:22Z) - Random Vector Functional Link Networks for Function Approximation on Manifolds [8.535815777849786]
We show that single layer neural-networks with random input-to-hidden layer weights and biases have seen success in practice.
We further adapt this randomized neural network architecture to approximate functions on smooth, compact submanifolds of Euclidean space.
arXiv Detail & Related papers (2020-07-30T23:50:44Z) - Reinforcement Learning with General Value Function Approximation:
Provably Efficient Approach via Bounded Eluder Dimension [124.7752517531109]
We establish a provably efficient reinforcement learning algorithm with general value function approximation.
We show that our algorithm achieves a regret bound of $widetildeO(mathrmpoly(dH)sqrtT)$ where $d$ is a complexity measure.
Our theory generalizes recent progress on RL with linear value function approximation and does not make explicit assumptions on the model of the environment.
arXiv Detail & Related papers (2020-05-21T17:36:09Z) - Stochastic Flows and Geometric Optimization on the Orthogonal Group [52.50121190744979]
We present a new class of geometrically-driven optimization algorithms on the orthogonal group $O(d)$.
We show that our methods can be applied in various fields of machine learning including deep, convolutional and recurrent neural networks, reinforcement learning, flows and metric learning.
arXiv Detail & Related papers (2020-03-30T15:37:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.