Related papers: On the Rate of Convergence of Kolmogorov-Arnold Network Regression Estimators

On the Rate of Convergence of Kolmogorov-Arnold Network Regression Estimators

URL: http://arxiv.org/abs/2509.19830v1
Date: Wed, 24 Sep 2025 07:22:03 GMT
Title: On the Rate of Convergence of Kolmogorov-Arnold Network Regression Estimators
Authors: Wei Liu, Eleni Chatzi, Zhilu Lai,
Abstract summary: Kolmogorov-Arnold Networks (KANs) offer a structured and interpretable framework for multivariate function approximation.<n>We prove that both additive and hybrid additive-multiplicative KANs attain the minimax-optimal convergence rate.<n>We derive guidelines for selecting the optimal number of knots in the B-splines.
Score: 4.595923896761076
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Kolmogorov-Arnold Networks (KANs) offer a structured and interpretable framework for multivariate function approximation by composing univariate transformations through additive or multiplicative aggregation. This paper establishes theoretical convergence guarantees for KANs when the univariate components are represented by B-splines. We prove that both additive and hybrid additive-multiplicative KANs attain the minimax-optimal convergence rate $O(n^{-2r/(2r+1)})$ for functions in Sobolev spaces of smoothness $r$. We further derive guidelines for selecting the optimal number of knots in the B-splines. The theory is supported by simulation studies that confirm the predicted convergence rates. These results provide a theoretical foundation for using KANs in nonparametric regression and highlight their potential as a structured alternative to existing methods.

Related papers

Variational Entropic Optimal Transport [67.76725267984578]
We propose Variational Entropic Optimal Transport (VarEOT) for domain translation problems.<n>VarEOT is based on an exact variational reformulation of the log-partition $log mathbbE[exp(cdot)$ as a tractable generalization over an auxiliary positive normalizer.<n> Experiments on synthetic data and unpaired image-to-image translation demonstrate competitive or improved translation quality.
arXiv Detail & Related papers (2026-02-02T15:48:44Z)
Geometric Convergence Analysis of Variational Inference via Bregman Divergences [3.7098038388802252]
Vari rigorous Inference (VI) provides a scalable framework for inference by the Lower Evidence (ELBO)<n>We establish a novel theoretical framework for analyzing objective convergence by exploiting the exponential family distributions.
arXiv Detail & Related papers (2025-10-17T11:30:05Z)
Neural Optimal Transport Meets Multivariate Conformal Prediction [58.43397908730771]
We propose a framework for conditional vectorile regression (CVQR)<n>CVQR combines neural optimal transport with quantized optimization, and apply it to predictions.
arXiv Detail & Related papers (2025-09-29T19:50:19Z)
Approximation Rates in Besov Norms and Sample-Complexity of Kolmogorov-Arnold Networks with Residual Connections [9.817834520159936]
Kolmogorov-Arnold Networks (KANs) have emerged as an improved backbone for most deep learning frameworks.<n>We show that KANs can optimally approximate any Besov function in $Bs_p,q(mathcalX)$ on a bounded open, or even fractal, domain.
arXiv Detail & Related papers (2025-04-21T14:02:59Z)
Variance-Reducing Couplings for Random Features [57.73648780299374]
Random features (RFs) are a popular technique to scale up kernel methods in machine learning. We find couplings to improve RFs defined on both Euclidean and discrete input spaces. We reach surprising conclusions about the benefits and limitations of variance reduction as a paradigm.
arXiv Detail & Related papers (2024-05-26T12:25:09Z)
Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training. We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z)
Last-Iterate Convergence of Adaptive Riemannian Gradient Descent for Equilibrium Computation [52.73824786627612]
This paper establishes new convergence results for textitgeodesic strongly monotone games.<n>Our key result shows that RGD attains last-iterate linear convergence in a textitgeometry-agnostic fashion.<n>Overall, this paper presents the first geometry-agnostic last-iterate convergence analysis for games beyond the Euclidean settings.
arXiv Detail & Related papers (2023-06-29T01:20:44Z)
Controlling the Complexity and Lipschitz Constant improves polynomial nets [55.121200972539114]
We derive new complexity bounds for the set of Coupled CP-Decomposition (CCP) and Nested Coupled CP-decomposition (NCP) models of Polynomial Nets. We propose a principled regularization scheme that we evaluate experimentally in six datasets and show that it improves the accuracy as well as the robustness of the models to adversarial perturbations.
arXiv Detail & Related papers (2022-02-10T14:54:29Z)
Equivalence of Convergence Rates of Posterior Distributions and Bayes Estimators for Functions and Nonparametric Functionals [4.375582647111708]
We study the posterior contraction rates of a Bayesian method with Gaussian process priors in nonparametric regression. For a general class of kernels, we establish convergence rates of the posterior measure of the regression function and its derivatives. Our proof shows that, under certain conditions, to any convergence rate of Bayes estimators there corresponds the same convergence rate of the posterior distributions.
arXiv Detail & Related papers (2020-11-27T19:11:56Z)
Tractable Approximate Gaussian Inference for Bayesian Neural Networks [1.933681537640272]
We propose an analytical method for performing tractable approximate Gaussian inference (TAGI) in Bayesian neural networks. The method has a computational complexity of $mathcalO(n)$ with respect to the number of parameters $n$, and the tests performed on regression and classification benchmarks confirm that, for a same network architecture, it matches the performance of existing methods relying on gradient backpropagation.
arXiv Detail & Related papers (2020-04-20T13:37:08Z)
The k-tied Normal Distribution: A Compact Parameterization of Gaussian Mean Field Posteriors in Bayesian Neural Networks [46.677567663908185]
Variational Bayesian Inference is a popular methodology for approxing posteriorimating over Bayesian neural network weights. Recent work has explored ever richer parameterizations of the approximate posterior in the hope of improving performance. We find that by decomposing these variational parameters into a low-rank factorization, we can make our variational approximation more compact without decreasing the models' performance.
arXiv Detail & Related papers (2020-02-07T07:33:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.