Lookup multivariate Kolmogorov-Arnold Networks
- URL: http://arxiv.org/abs/2509.07103v2
- Date: Fri, 17 Oct 2025 12:56:09 GMT
- Title: Lookup multivariate Kolmogorov-Arnold Networks
- Authors: Sergey Pozdnyakov, Philippe Schwaller,
- Abstract summary: High-dimensional linear mappings dominate both the parameter count and the computational cost of most modern deep-learning models.<n>We introduce a general-purpose drop-in replacement, lookup multivariate Kolmogorov-Arnold Networks (lmKANs)<n>lmKANs deliver a substantially better trade-off between capacity and inference cost.
- Score: 5.639419519849473
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: High-dimensional linear mappings, or linear layers, dominate both the parameter count and the computational cost of most modern deep-learning models. We introduce a general-purpose drop-in replacement, lookup multivariate Kolmogorov-Arnold Networks (lmKANs), which deliver a substantially better trade-off between capacity and inference cost. Our construction expresses a general high-dimensional mapping through trainable low-dimensional multivariate functions. These functions can carry dozens or hundreds of trainable parameters each, and yet it takes only a few multiplications to compute them because they are implemented as spline lookup tables. Empirically, lmKANs reduce inference FLOPs by up to 6.0x while matching the flexibility of MLPs in general high-dimensional function approximation. In another feedforward fully connected benchmark, on the tabular-like dataset of randomly displaced methane configurations, lmKANs enable more than 10x higher H100 throughput at equal accuracy. Within frameworks of Convolutional Neural Networks, lmKAN-based CNNs cut inference FLOPs at matched accuracy by 1.6-2.1x and by 1.7x on the CIFAR-10 and ImageNet-1k datasets, respectively. Our code, including dedicated CUDA kernels, is available online at https://github.com/schwallergroup/lmkan.
Related papers
- SUPN: Shallow Universal Polynomial Networks [2.817874121826956]
Deep neural networks (DNNs) and Kolmogorov-Arnold networks (KANs) are popular methods for function approximation.<n>We propose shallow universal networks (SUPNs) to produce a suitable approximation.<n>We show that SUPNs converge at the same rate as the best approximation of the same degree.
arXiv Detail & Related papers (2025-11-26T14:06:42Z) - K-DAREK: Distance Aware Error for Kurkova Kolmogorov Networks [3.460138063155115]
We develop a novel learning algorithm, distance-aware error for Kurkova-Kolmogorov networks (K-DAREK), for efficient and interpretable function approximation with uncertainty quantification.<n>Our approach establishes robust error bounds that are distance-aware; this means they reflect the proximity of a test point to its nearest training points.
arXiv Detail & Related papers (2025-10-24T20:49:59Z) - Facet: highly efficient E(3)-equivariant networks for interatomic potentials [6.741915610607818]
Computational materials discovery is limited by the high cost of first-principles calculations.<n>Machine learning potentials that predict energies from crystal structures are promising, but existing methods face computational bottlenecks.<n>We present Facet, a GNN architecture for efficient ML potentials.
arXiv Detail & Related papers (2025-09-10T09:06:24Z) - KHRONOS: a Kernel-Based Neural Architecture for Rapid, Resource-Efficient Scientific Computation [0.9355993154058798]
We introduce KHRONOS, an AI framework for model based, model free and model inversion tasks.<n>KHRONOS constructs continuously differentiable target fields with a hierarchical composition of per-dimension kernel expansions.<n>For inverse problems, KHRONOS facilitates rapid, iterative level set recovery in only a few forward evaluations, with sub-microsecond per sample latency.
arXiv Detail & Related papers (2025-05-19T16:29:07Z) - No Free Lunch From Random Feature Ensembles [23.661623767100384]
Given a budget on total model size, one must decide whether to train a single, large neural network or to combine the predictions of many smaller networks.<n>We prove that when a fixed number of trainable parameters are partitioned among $K$ independently trained models, $K=1$ achieves optimal performance.<n>We identify conditions on the kernel and task eigenstructure under which ensembles can achieve near-optimal scaling laws.
arXiv Detail & Related papers (2024-12-06T20:55:27Z) - Incorporating Arbitrary Matrix Group Equivariance into KANs [69.30866522377694]
Kolmogorov-Arnold Networks (KANs) have seen great success in scientific domains.<n>We propose Equivariant Kolmogorov-Arnold Networks (EKAN) to broaden their applicability to more fields.
arXiv Detail & Related papers (2024-10-01T06:34:58Z) - Kronecker-Factored Approximate Curvature for Modern Neural Network
Architectures [85.76673783330334]
Two different settings of linear weight-sharing layers motivate two flavours of Kronecker-Factored Approximate Curvature (K-FAC)
We show they are exact for deep linear networks with weight-sharing in their respective setting.
We observe little difference between these two K-FAC variations when using them to train both a graph neural network and a vision transformer.
arXiv Detail & Related papers (2023-11-01T16:37:00Z) - Comparison of Affine and Rational Quadratic Spline Coupling and Autoregressive Flows through Robust Statistical Tests [0.0]
We propose an in-depth comparison of coupling and autoregressive flows based on symmetric and non-symmetric bijectors.<n>We focus on a set of multimodal target distributions of increasing dimensionality ranging from 4 to 400.<n>Our results indicate that the A-RQS algorithm stands out both in terms of accuracy and training speed.
arXiv Detail & Related papers (2023-02-23T13:34:01Z) - Adaptive Split-Fusion Transformer [90.04885335911729]
We propose an Adaptive Split-Fusion Transformer (ASF-former) to treat convolutional and attention branches differently with adaptive weights.
Experiments on standard benchmarks, such as ImageNet-1K, show that our ASF-former outperforms its CNN, transformer counterparts, and hybrid pilots in terms of accuracy.
arXiv Detail & Related papers (2022-04-26T10:00:28Z) - Instant Neural Graphics Primitives with a Multiresolution Hash Encoding [67.33850633281803]
We present a versatile new input encoding that permits the use of a smaller network without sacrificing quality.
A small neural network is augmented by a multiresolution hash table of trainable feature vectors whose values are optimized through a gradient descent.
We achieve a combined speed of several orders of magnitude, enabling training of high-quality neural graphics primitives in a matter of seconds.
arXiv Detail & Related papers (2022-01-16T07:22:47Z) - ACDC: Weight Sharing in Atom-Coefficient Decomposed Convolution [57.635467829558664]
We introduce a structural regularization across convolutional kernels in a CNN.
We show that CNNs now maintain performance with dramatic reduction in parameters and computations.
arXiv Detail & Related papers (2020-09-04T20:41:47Z) - DS-FACTO: Doubly Separable Factorization Machines [4.281959480566438]
Factorization Machines (FM) are powerful class of models that incorporate higher-order interaction among features to add more expressive power to linear models.
Despite using a low-rank representation for the pairwise features, the memory overheads of using factorization machines on large-scale real-world datasets can be prohibitively high.
Traditional algorithms for FM which work on a single-machine are not equipped to handle this scale and therefore, using a distributed algorithm to parallelize computation across a cluster is inevitable.
arXiv Detail & Related papers (2020-04-29T03:36:28Z) - Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks.
We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.