Related papers: Want to train KANS at scale? Now UKAN!

Want to train KANS at scale? Now UKAN!

URL: http://arxiv.org/abs/2408.11200v4
Date: Thu, 09 Oct 2025 02:32:06 GMT
Title: Want to train KANS at scale? Now UKAN!
Authors: Alireza Moradzadeh, Srimukh Prasad Veccham, Lukasz Wawrzyniak, Miles Macklin, Saee G. Paliwal,
Abstract summary: We present Unbounded Kolmogorov-Arnold Networks (UKANs), a method that removes the need for bounded grids in traditional Kolmogorov-Arnold Networks (KANs)<n>UKANs couple multilayer perceptrons with KANs by feeding the positional encoding of grid groups into the CG model, enabling function approximation on unbounded domains without requiring data normalization.
Score: 2.9666099400348607
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Kolmogorov-Arnold Networks (KANs) have recently emerged as a powerful alternative to traditional multilayer perceptrons. However, their reliance on predefined, bounded grids restricts their ability to approximate functions on unbounded domains. To address this, we present Unbounded Kolmogorov-Arnold Networks (UKANs), a method that removes the need for bounded grids in traditional Kolmogorov-Arnold Networks (KANs). The key innovation of this method is a coefficient-generator (CG) model that produces, on the fly, only the B-spline coefficients required locally on an unbounded symmetric grid. UKANs couple multilayer perceptrons with KANs by feeding the positional encoding of grid groups into the CG model, enabling function approximation on unbounded domains without requiring data normalization. To reduce the computational cost of both UKANs and KANs, we introduce a GPU-accelerated library that lowers B-spline evaluation complexity by a factor proportional to the grid size, enabling large-scale learning by leveraging efficient memory management, in line with recent software advances such as FlashAttention and FlashFFTConv. Performance benchmarking confirms the superior memory and computational efficiency of our accelerated KAN (warpKAN), and UKANs, showing a 3-30x speed-up and up to 1000x memory reduction compared to vanilla KANs. Experiments on regression, classification, and generative tasks demonstrate the effectiveness of UKANs to match or surpass KAN accuracy. Finally, we use both accelerated KAN and UKAN in a molecular property prediction task, establishing the feasibility of large-scale end-to-end training with our optimized implementation.

Related papers

TruKAN: Towards More Efficient Kolmogorov-Arnold Networks Using Truncated Power Functions [0.5097809301149341]
We propose TruKAN to address the trade-off between computational efficiency and adherence to Kolmogorov-Arnold Network (KAN) principles.<n>TruKAN replaces the B-spline basis in KAN with a family of truncated power functions derived from k-order theory.<n>Overall, TruKAN outperforms other KAN models in terms of accuracy, computational efficiency and memory usage on the complex vision task.
arXiv Detail & Related papers (2026-02-02T19:33:02Z)
EUGens: Efficient, Unified, and General Dense Layers [56.498769704575544]
We propose a new class of dense layers that generalize standard fully-connected feedforward layers, textbfEfficient, textbfUnimat and textbfGeneral dense layers (EUGens)<n>EUGens leverage random features to approximate standard FFLs and go beyond them by incorporating a direct dependence on the input norms in their computations.
arXiv Detail & Related papers (2026-01-30T05:01:03Z)
From GNNs to Symbolic Surrogates via Kolmogorov-Arnold Networks for Delay Prediction [3.571534406261392]
We implement a heterogeneous GNN with attention-based message passing, establishing a strong neural baseline.<n>Second, we propose FlowKANet in which Kolmogorov-Arnold Networks replace standard layers, reducing trainable parameters.<n>Third, we distill the model into symbolic surrogate models using block-wise regression, producing closed-form equations that eliminate trainable computation.
arXiv Detail & Related papers (2025-12-24T02:05:46Z)
K-DAREK: Distance Aware Error for Kurkova Kolmogorov Networks [3.460138063155115]
We develop a novel learning algorithm, distance-aware error for Kurkova-Kolmogorov networks (K-DAREK), for efficient and interpretable function approximation with uncertainty quantification.<n>Our approach establishes robust error bounds that are distance-aware; this means they reflect the proximity of a test point to its nearest training points.
arXiv Detail & Related papers (2025-10-24T20:49:59Z)
Deep Hierarchical Learning with Nested Subspace Networks [53.71337604556311]
We propose Nested Subspace Networks (NSNs) for large neural networks.<n>NSNs enable a single model to be dynamically and granularly adjusted across a continuous spectrum of compute budgets.<n>We show that NSNs can be surgically applied to pre-trained LLMs and unlock a smooth and predictable compute-performance frontier.
arXiv Detail & Related papers (2025-09-22T15:13:14Z)
LASE: Learned Adjacency Spectral Embeddings [7.612218105739107]
We learn nodal Adjacency Spectral Embeddings (ASE) from graph inputs. LASE is interpretable, parameter efficient, robust to inputs with unobserved edges. LASE layers combine Graph Convolutional Network (GCN) and fully-connected Graph Attention Network (GAT) modules.
arXiv Detail & Related papers (2024-12-23T17:35:19Z)
Kolmogorov-Arnold Transformer [72.88137795439407]
We introduce the Kolmogorov-Arnold Transformer (KAT), a novel architecture that replaces layers with Kolmogorov-Arnold Network (KAN) layers. We identify three key challenges: (C1) Base function, (C2) Inefficiency, and (C3) Weight. With these designs, KAT outperforms traditional-based transformers.
arXiv Detail & Related papers (2024-09-16T17:54:51Z)
SineKAN: Kolmogorov-Arnold Networks Using Sinusoidal Activation Functions [0.0]
We present a model in which learnable grids of B-Spline activation functions are replaced by grids of re-weighted sine functions (SineKAN) We show that our model can perform better than or comparable to B-Spline KAN models and an alternative KAN implementation based on periodic cosine and sine functions.
arXiv Detail & Related papers (2024-07-04T20:53:19Z)
Kronecker-Factored Approximate Curvature for Modern Neural Network Architectures [85.76673783330334]
Two different settings of linear weight-sharing layers motivate two flavours of Kronecker-Factored Approximate Curvature (K-FAC) We show they are exact for deep linear networks with weight-sharing in their respective setting. We observe little difference between these two K-FAC variations when using them to train both a graph neural network and a vision transformer.
arXiv Detail & Related papers (2023-11-01T16:37:00Z)
Adaptive Federated Pruning in Hierarchical Wireless Networks [69.6417645730093]
Federated Learning (FL) is a privacy-preserving distributed learning framework where a server aggregates models updated by multiple devices without accessing their private datasets. In this paper, we introduce model pruning for HFL in wireless networks to reduce the neural network scale. We show that our proposed HFL with model pruning achieves similar learning accuracy compared with the HFL without model pruning and reduces about 50 percent communication cost.
arXiv Detail & Related papers (2023-05-15T22:04:49Z)
On Model Compression for Neural Networks: Framework, Algorithm, and Convergence Guarantee [21.818773423324235]
This paper focuses on two model compression techniques: low-rank approximation and weight approximation. In this paper, a holistic framework is proposed for model compression from a novel perspective of non optimization.
arXiv Detail & Related papers (2023-03-13T02:14:42Z)
EGRC-Net: Embedding-induced Graph Refinement Clustering Network [66.44293190793294]
We propose a novel graph clustering network called Embedding-Induced Graph Refinement Clustering Network (EGRC-Net) EGRC-Net effectively utilizes the learned embedding to adaptively refine the initial graph and enhance the clustering performance. Our proposed methods consistently outperform several state-of-the-art approaches.
arXiv Detail & Related papers (2022-11-19T09:08:43Z)
Sample-Then-Optimize Batch Neural Thompson Sampling [50.800944138278474]
We introduce two algorithms for black-box optimization based on the Thompson sampling (TS) policy. To choose an input query, we only need to train an NN and then choose the query by maximizing the trained NN. Our algorithms sidestep the need to invert the large parameter matrix yet still preserve the validity of the TS policy.
arXiv Detail & Related papers (2022-10-13T09:01:58Z)
RCC-GAN: Regularized Compound Conditional GAN for Large-Scale Tabular Data Synthesis [7.491711487306447]
This paper introduces a novel generative adversarial network (GAN) for synthesizing large-scale databases. We propose a new formulation for deriving a vector incorporating both binary and discrete features simultaneously. We present a regularization scheme towards limiting unprecedented variations on its weight vectors during training.
arXiv Detail & Related papers (2022-05-24T01:14:59Z)
Unfolding Projection-free SDP Relaxation of Binary Graph Classifier via GDPA Linearization [59.87663954467815]
Algorithm unfolding creates an interpretable and parsimonious neural network architecture by implementing each iteration of a model-based algorithm as a neural layer. In this paper, leveraging a recent linear algebraic theorem called Gershgorin disc perfect alignment (GDPA), we unroll a projection-free algorithm for semi-definite programming relaxation (SDR) of a binary graph. Experimental results show that our unrolled network outperformed pure model-based graph classifiers, and achieved comparable performance to pure data-driven networks but using far fewer parameters.
arXiv Detail & Related papers (2021-09-10T07:01:15Z)
Robust Implicit Networks via Non-Euclidean Contractions [63.91638306025768]
Implicit neural networks show improved accuracy and significant reduction in memory consumption. They can suffer from ill-posedness and convergence instability. This paper provides a new framework to design well-posed and robust implicit neural networks.
arXiv Detail & Related papers (2021-06-06T18:05:02Z)
Improved Branch and Bound for Neural Network Verification via Lagrangian Decomposition [161.09660864941603]
We improve the scalability of Branch and Bound (BaB) algorithms for formally proving input-output properties of neural networks. We present a novel activation-based branching strategy and a BaB framework, named Branch and Dual Network Bound (BaDNB) BaDNB outperforms previous complete verification systems by a large margin, cutting average verification times by factors up to 50 on adversarial properties.
arXiv Detail & Related papers (2021-04-14T09:22:42Z)
Adaptive Filters and Aggregator Fusion for Efficient Graph Convolutions [11.769185588579488]
We present state-of-the-art performance with lower memory consumption and latency, along with characteristics suited to accelerator implementation. Our proposal uses memory proportional to the number of vertices in the graph, in contrast to competing methods which require memory proportional to the number of edges. We propose aggregator fusion, a technique to enable GNNs to significantly boost their representational power, with only a small increase in latency of 19% over standard sparse matrix multiplication.
arXiv Detail & Related papers (2021-04-03T20:54:36Z)
Self Sparse Generative Adversarial Networks [73.590634413751]
Generative Adversarial Networks (GANs) are an unsupervised generative model that learns data distribution through adversarial training. We propose a Self Sparse Generative Adversarial Network (Self-Sparse GAN) that reduces the parameter space and alleviates the zero gradient problem.
arXiv Detail & Related papers (2021-01-26T04:49:12Z)
Berrut Approximated Coded Computing: Straggler Resistance Beyond Polynomial Computing [34.69732430310801]
We propose Berrut Approximated Coded Computing (BACC) as an alternative approach to deal with stragglers effect. BACC is proven to be numerically stable with low computational complexity. In particular, BACC is used to train a deep neural network on a cluster of servers.
arXiv Detail & Related papers (2020-09-17T14:23:38Z)
Stochastic Bundle Adjustment for Efficient and Scalable 3D Reconstruction [43.736296034673124]
Current bundle adjustment solvers such as the Levenberg-Marquardt (LM) algorithm are limited by the bottleneck in solving the Reduced Camera System (RCS) whose dimension is proportional to the camera number. We propose a bundle adjustment algorithm which seeks to decompose the RCS approximately inside the LM to improve the efficiency and scalability.
arXiv Detail & Related papers (2020-08-02T10:26:09Z)
Revisiting Graph based Collaborative Filtering: A Linear Residual Graph Convolutional Network Approach [55.44107800525776]
Graph Convolutional Networks (GCNs) are state-of-the-art graph based representation learning models. In this paper, we revisit GCN based Collaborative Filtering (CF) based Recommender Systems (RS) We show that removing non-linearities would enhance recommendation performance, consistent with the theories in simple graph convolutional networks. We propose a residual network structure that is specifically designed for CF with user-item interaction modeling.
arXiv Detail & Related papers (2020-01-28T04:41:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.