Related papers: Projective Kolmogorov Arnold Neural Networks (P-KANs): Entropy-Driven Functional Space Discovery for Interpretable Machine Learning

Projective Kolmogorov Arnold Neural Networks (P-KANs): Entropy-Driven Functional Space Discovery for Interpretable Machine Learning

URL: http://arxiv.org/abs/2509.20049v1
Date: Wed, 24 Sep 2025 12:15:37 GMT
Title: Projective Kolmogorov Arnold Neural Networks (P-KANs): Entropy-Driven Functional Space Discovery for Interpretable Machine Learning
Authors: Alastair Poole, Stig McArthur, Saravan Kumar,
Abstract summary: Kolmogorov-Arnold Networks (KANs) relocate learnable nonlinearities from nodes to edges.<n>Current KANs suffer from fundamental inefficiencies due to redundancy in high-dimensional spline parameter spaces.<n>We introduce Projective Kolmogorov-Arnold Networks (P-KANs), a novel training framework that guides edge function discovery.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Kolmogorov-Arnold Networks (KANs) relocate learnable nonlinearities from nodes to edges, demonstrating remarkable capabilities in scientific machine learning and interpretable modeling. However, current KAN implementations suffer from fundamental inefficiencies due to redundancy in high-dimensional spline parameter spaces, where numerous distinct parameterisations yield functionally equivalent behaviors. This redundancy manifests as a "nuisance space" in the model's Jacobian, leading to susceptibility to overfitting and poor generalization. We introduce Projective Kolmogorov-Arnold Networks (P-KANs), a novel training framework that guides edge function discovery towards interpretable functional representations through entropy-minimisation techniques from signal analysis and sparse dictionary learning. Rather than constraining functions to predetermined spaces, our approach maintains spline space flexibility while introducing "gravitational" terms that encourage convergence towards optimal functional representations. Our key insight recognizes that optimal representations can be identified through entropy analysis of projection coefficients, compressing edge functions to lower-parameter projective spaces (Fourier, Chebyshev, Bessel). P-KANs demonstrate superior performance across multiple domains, achieving up to 80% parameter reduction while maintaining representational capacity, significantly improved robustness to noise compared to standard KANs, and successful application to industrial automated fiber placement prediction. Our approach enables automatic discovery of mixed functional representations where different edges converge to different optimal spaces, providing both compression benefits and enhanced interpretability for scientific machine learning applications.

Related papers

FEKAN: Feature-Enriched Kolmogorov-Arnold Networks [0.34376560669160394]
Kolmogorov-Arnold Networks (KANs) have emerged as a compelling alternative to multilayer perceptrons.<n>FEKAN is a simple yet effective extension that preserves all the advantages of KAN while improving computational efficiency and predictive accuracy.
arXiv Detail & Related papers (2026-02-18T15:17:55Z)
Deep Hierarchical Learning with Nested Subspace Networks [53.71337604556311]
We propose Nested Subspace Networks (NSNs) for large neural networks.<n>NSNs enable a single model to be dynamically and granularly adjusted across a continuous spectrum of compute budgets.<n>We show that NSNs can be surgically applied to pre-trained LLMs and unlock a smooth and predictable compute-performance frontier.
arXiv Detail & Related papers (2025-09-22T15:13:14Z)
Function Forms of Simple ReLU Networks with Random Hidden Weights [1.2289361708127877]
We investigate the function space dynamics of a two-layer ReLU neural network in the infinite-width limit.<n>We highlight the Fisher information matrix's role in steering learning.<n>This work offers a robust foundation for understanding wide neural networks.
arXiv Detail & Related papers (2025-05-23T13:53:02Z)
Extension of Symmetrized Neural Network Operators with Fractional and Mixed Activation Functions [0.0]
We propose a novel extension to symmetrized neural network operators by incorporating fractional and mixed activation functions.<n>Our framework introduces a fractional exponent in the activation functions, allowing adaptive non-linear approximations with improved accuracy.
arXiv Detail & Related papers (2025-01-17T14:24:25Z)
TensorGRaD: Tensor Gradient Robust Decomposition for Memory-Efficient Neural Operator Training [91.8932638236073]
We introduce textbfTensorGRaD, a novel method that directly addresses the memory challenges associated with large-structured weights.<n>We show that sparseGRaD reduces total memory usage by over $50%$ while maintaining and sometimes even improving accuracy.
arXiv Detail & Related papers (2025-01-04T20:51:51Z)
Chebyshev Polynomial-Based Kolmogorov-Arnold Networks: An Efficient Architecture for Nonlinear Function Approximation [0.0]
This paper presents the Chebyshev Kolmogorov-Arnold Network (Chebyshev KAN), a new neural network architecture inspired by the Kolmogorov-Arnold theorem. By utilizing learnable functions parametrized by Chebyshevs on the network's edges, Chebyshev KANs enhance flexibility, efficiency, and interpretability in function approximation tasks.
arXiv Detail & Related papers (2024-05-12T07:55:43Z)
Dynamic Kernel-Based Adaptive Spatial Aggregation for Learned Image Compression [63.56922682378755]
We focus on extending spatial aggregation capability and propose a dynamic kernel-based transform coding. The proposed adaptive aggregation generates kernel offsets to capture valid information in the content-conditioned range to help transform. Experimental results demonstrate that our method achieves superior rate-distortion performance on three benchmarks compared to the state-of-the-art learning-based methods.
arXiv Detail & Related papers (2023-08-17T01:34:51Z)
Nonparametric Classification on Low Dimensional Manifolds using Overparameterized Convolutional Residual Networks [78.11734286268455]
We study the performance of ConvResNeXts, trained with weight decay from the perspective of nonparametric classification.<n>Our analysis allows for infinitely many building blocks in ConvResNeXts, and shows that weight decay implicitly enforces sparsity on these blocks.
arXiv Detail & Related papers (2023-07-04T11:08:03Z)
Promises and Pitfalls of the Linearized Laplace in Bayesian Optimization [73.80101701431103]
The linearized-Laplace approximation (LLA) has been shown to be effective and efficient in constructing Bayesian neural networks. We study the usefulness of the LLA in Bayesian optimization and highlight its strong performance and flexibility.
arXiv Detail & Related papers (2023-04-17T14:23:43Z)
Offline Reinforcement Learning with Differentiable Function Approximation is Provably Efficient [65.08966446962845]
offline reinforcement learning, which aims at optimizing decision-making strategies with historical data, has been extensively applied in real-life applications. We take a step by considering offline reinforcement learning with differentiable function class approximation (DFA) Most importantly, we show offline differentiable function approximation is provably efficient by analyzing the pessimistic fitted Q-learning algorithm.
arXiv Detail & Related papers (2022-10-03T07:59:42Z)
Contrastive Conditional Neural Processes [45.70735205041254]
Conditional Neural Processes(CNPs) bridge neural networks with probabilistic inference to approximate functions of Processes under meta-learning settings. Two auxiliary contrastive branches are set up hierarchically, namely in-instantiation temporal contrastive learning(tt TCL) and cross-instantiation function contrastive learning(tt FCL) We empirically show that tt TCL captures high-level abstraction of observations, whereas tt FCL helps identify underlying functions, which in turn provides more efficient representations.
arXiv Detail & Related papers (2022-03-08T10:08:45Z)
Learning Sub-Patterns in Piecewise Continuous Functions [4.18804572788063]
Most gradient descent algorithms can optimize neural networks that are sub-differentiable in their parameters. This paper focuses on the case where the discontinuities arise from distinct sub-patterns. We propose a new discontinuous deep neural network model trainable via a decoupled two-step procedure.
arXiv Detail & Related papers (2020-10-29T13:44:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.