Related papers: Initialization Schemes for Kolmogorov-Arnold Networks: An Empirical Study

Initialization Schemes for Kolmogorov-Arnold Networks: An Empirical Study

URL: http://arxiv.org/abs/2509.03417v1
Date: Wed, 03 Sep 2025 15:45:28 GMT
Title: Initialization Schemes for Kolmogorov-Arnold Networks: An Empirical Study
Authors: Spyros Rigas, Dhruv Verma, Georgios Alexandridis, Yixuan Wang,
Abstract summary: Kolmogorov-Arnold Networks (KANs) are a recently introduced neural architecture that replace fixed nonlinearities with trainable activation functions.<n>This work proposes two theory-driven approaches inspired by LeCun and Glorot, as well as an empirical power-law family with tunable exponents.
Score: 9.450853542720909
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Kolmogorov-Arnold Networks (KANs) are a recently introduced neural architecture that replace fixed nonlinearities with trainable activation functions, offering enhanced flexibility and interpretability. While KANs have been applied successfully across scientific and machine learning tasks, their initialization strategies remain largely unexplored. In this work, we study initialization schemes for spline-based KANs, proposing two theory-driven approaches inspired by LeCun and Glorot, as well as an empirical power-law family with tunable exponents. Our evaluation combines large-scale grid searches on function fitting and forward PDE benchmarks, an analysis of training dynamics through the lens of the Neural Tangent Kernel, and evaluations on a subset of the Feynman dataset. Our findings indicate that the Glorot-inspired initialization significantly outperforms the baseline in parameter-rich models, while power-law initialization achieves the strongest performance overall, both across tasks and for architectures of varying size. All code and data accompanying this manuscript are publicly available at https://github.com/srigas/KAN_Initialization_Schemes.

Related papers

A Theory of How Pretraining Shapes Inductive Bias in Fine-Tuning [51.505728136705564]
We develop an analytical theory of the pretraining-fine-tuning pipeline in diagonal linear networks.<n>We find that different initialization choices place the network into four distinct fine-tuning regimes.<n>A smaller scale in earlier layers enables the network to both reuse and refine its features, leading to superior generalization.
arXiv Detail & Related papers (2026-02-23T17:19:33Z)
Improving Set Function Approximation with Quasi-Arithmetic Neural Networks [23.73257235603082]
We propose quasi-arithmetic neural networks (QUANNs)<n>QUANNs are universal approximators for a broad class of common set-function decompositions.<n>We provide a theoretical analysis showing that, QUANNs are universal approximators for a broad class of common set-function decompositions.
arXiv Detail & Related papers (2026-02-04T18:36:31Z)
Deep Neural Networks as Iterated Function Systems and a Generalization Bound [2.7920304852537536]
We show that two important deep architectures can be viewed as, or canonically associated with, place-dependent IFS.<n>We derive a Wasserstein bound for generative modeling that controls the collage-type approximation error between the data distribution and its image.
arXiv Detail & Related papers (2026-01-27T07:32:49Z)
A Practitioner's Guide to Kolmogorov-Arnold Networks [2.304209804119502]
Kolmogorov-Arnold Networks (KANs) have emerged as a promising alternative to traditional Multilayer Perceptrons (MLPs)<n>This review provides a systematic and comprehensive overview of the rapidly expanding KAN landscape.
arXiv Detail & Related papers (2025-10-28T03:03:44Z)
Deep Hierarchical Learning with Nested Subspace Networks [53.71337604556311]
We propose Nested Subspace Networks (NSNs) for large neural networks.<n>NSNs enable a single model to be dynamically and granularly adjusted across a continuous spectrum of compute budgets.<n>We show that NSNs can be surgically applied to pre-trained LLMs and unlock a smooth and predictable compute-performance frontier.
arXiv Detail & Related papers (2025-09-22T15:13:14Z)
A Survey on Kolmogorov-Arnold Network [0.0]
Review explores the theoretical foundations, evolution, applications, and future potential of Kolmogorov-Arnold Networks (KAN) KANs distinguish themselves from traditional neural networks by using learnable, spline- parameterized functions instead of fixed activation functions. This paper highlights KAN's role in modern neural architectures and outlines future directions to improve its computational efficiency, interpretability, and scalability in data-intensive applications.
arXiv Detail & Related papers (2024-11-09T05:54:17Z)
Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond [61.18736646013446]
In pursuit of a deeper understanding of its surprising behaviors, we investigate the utility of a simple yet accurate model of a trained neural network. Across three case studies, we illustrate how it can be applied to derive new empirical insights on a diverse range of prominent phenomena.
arXiv Detail & Related papers (2024-10-31T22:54:34Z)
How to Make LLMs Strong Node Classifiers? [70.14063765424012]
Language Models (LMs) are challenging the dominance of domain-specific models, such as Graph Neural Networks (GNNs) and Graph Transformers (GTs)<n>We propose a novel approach that empowers off-the-shelf LMs to achieve performance comparable to state-of-the-art (SOTA) GNNs on node classification tasks.
arXiv Detail & Related papers (2024-10-03T08:27:54Z)
Spatiotemporal Graph Learning with Direct Volumetric Information Passing and Feature Enhancement [62.91536661584656]
We propose a dual-module framework, Cell-embedded and Feature-enhanced Graph Neural Network (aka, CeFeGNN) for learning.<n>We embed learnable cell attributions to the common node-edge message passing process, which better captures the spatial dependency of regional features.<n>Experiments on various PDE systems and one real-world dataset demonstrate that CeFeGNN achieves superior performance compared with other baselines.
arXiv Detail & Related papers (2024-09-26T16:22:08Z)
Reimagining Linear Probing: Kolmogorov-Arnold Networks in Transfer Learning [18.69601183838834]
Kolmogorov-Arnold Networks (KAN) is an enhancement to the traditional linear probing method in transfer learning. KAN consistently outperforms traditional linear probing, achieving significant improvements in accuracy and generalization.
arXiv Detail & Related papers (2024-09-12T05:36:40Z)
High-Performance Few-Shot Segmentation with Foundation Models: An Empirical Study [64.06777376676513]
We develop a few-shot segmentation (FSS) framework based on foundation models. To be specific, we propose a simple approach to extract implicit knowledge from foundation models to construct coarse correspondence. Experiments on two widely used datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-09-10T08:04:11Z)
Unifying Self-Supervised Clustering and Energy-Based Models [9.3176264568834]
We establish a principled connection between self-supervised learning and generative models.<n>We show that our solution can be integrated into a neuro-symbolic framework to tackle a simple yet non-trivial instantiation of the symbol grounding problem.
arXiv Detail & Related papers (2023-12-30T04:46:16Z)
Joint Feature and Differentiable $ k $-NN Graph Learning using Dirichlet Energy [103.74640329539389]
We propose a deep FS method that simultaneously conducts feature selection and differentiable $ k $-NN graph learning. We employ Optimal Transport theory to address the non-differentiability issue of learning $ k $-NN graphs in neural networks. We validate the effectiveness of our model with extensive experiments on both synthetic and real-world datasets.
arXiv Detail & Related papers (2023-05-21T08:15:55Z)
Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks. This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z)
A Unified Paths Perspective for Pruning at Initialization [0.0]
We introduce the Path Kernel as the data-independent factor in a decomposition of the Neural Tangent Kernel. We show the global structure of the Path Kernel can be computed efficiently. We analyze the use of this structure in approximating training and generalization performance of networks in the absence of data.
arXiv Detail & Related papers (2021-01-26T04:29:50Z)
Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix. Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z)
Weighted Aggregating Stochastic Gradient Descent for Parallel Deep Learning [8.366415386275557]
Solution involves a reformation of the objective function for optimization in neural network models. We introduce a decentralized weighted aggregating scheme based on the performance of local workers. To validate the new method, we benchmark our schemes against several popular algorithms.
arXiv Detail & Related papers (2020-04-07T23:38:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.