Related papers: Optimized Architectures for Kolmogorov-Arnold Networks

Optimized Architectures for Kolmogorov-Arnold Networks

URL: http://arxiv.org/abs/2512.12448v1
Date: Sat, 13 Dec 2025 20:14:08 GMT
Title: Optimized Architectures for Kolmogorov-Arnold Networks
Authors: James Bagrow, Josh Bongard,
Abstract summary: Efforts to improve Kolmogorov-Arnold networks (KANs) with architectural enhancements have been stymied by the complexity that makes KANs attractive.<n>Here we study overprovisioned architectures combined with sparsification to learn compact, interpretable KANs without sacrificing accuracy.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Efforts to improve Kolmogorov-Arnold networks (KANs) with architectural enhancements have been stymied by the complexity those enhancements bring, undermining the interpretability that makes KANs attractive in the first place. Here we study overprovisioned architectures combined with sparsification to learn compact, interpretable KANs without sacrificing accuracy. Crucially, we focus on differentiable sparsification, turning architecture search into an end-to-end optimization problem. Across function approximation benchmarks, dynamical systems forecasting, and real-world prediction tasks, we demonstrate competitive or superior accuracy while discovering substantially smaller models. Overprovisioning and sparsification are synergistic, with the combination outperforming either alone. The result is a principled path toward models that are both more expressive and more interpretable, addressing a key tension in scientific machine learning.

Related papers

Systematic Characterization of Minimal Deep Learning Architectures: A Unified Analysis of Convergence, Pruning, and Quantization [6.49583548940407]
Deep learning networks excel at classification, yet identifying minimal architectures that reliably solve a task remains challenging.<n>We present a computational methodology for exploring and analyzing the relationships among convergence, pruning, and quantization.<n>Our initial results show that, despite architectural diversity, performance is largely invariant and learning dynamics consistently exhibit three regimes: unstable, learning, and overfitting.
arXiv Detail & Related papers (2026-01-25T20:31:10Z)
Knowledge-Informed Neural Network for Complex-Valued SAR Image Recognition [51.03674130115878]
We introduce the Knowledge-Informed Neural Network (KINN), a lightweight framework built upon a novel "compression-aggregation-compression" architecture.<n>KINN establishes a state-of-the-art in parameter-efficient recognition, offering exceptional generalization in data-scarce and out-of-distribution scenarios.
arXiv Detail & Related papers (2025-10-23T07:12:26Z)
Principled Approximation Methods for Efficient and Scalable Deep Learning [4.082286997378594]
This thesis investigates principled approximation methods for improving the efficiency of deep learning systems.<n>We study three main approaches toward improved efficiency: architecture design, model compression, and optimization.<n>Our contributions center on tackling computationally hard problems via scalable and principled approximations.
arXiv Detail & Related papers (2025-08-29T18:17:48Z)
CARL: Causality-guided Architecture Representation Learning for an Interpretable Performance Predictor [6.014777261874645]
Performance predictors have emerged as a promising method to accelerate the evaluation stage of neural architecture search (NAS)<n>We propose a Causality-guided Architecture Representation Learning (CARL) method aiming to separate critical (causal) and redundant (non-causal) features of architectures for generalizable architecture performance prediction.<n>Experiments on five NAS search spaces demonstrate the state-of-the-art accuracy and superior interpretability of CARL.
arXiv Detail & Related papers (2025-06-04T14:30:55Z)
Multi-Exit Kolmogorov-Arnold Networks: enhancing accuracy and parsimony [0.0]
Kolmogorov-Arnold Networks (KANs) combine high accuracy with interpretability, making them valuable for scientific modeling.<n>Here we introduce multi-exit KANs, where each layer includes its own prediction branch, enabling the network to make accurate predictions at multiple depths simultaneously.<n>This architecture provides deep supervision that improves training while discovering the right level of model complexity for each task.
arXiv Detail & Related papers (2025-06-03T18:41:30Z)
Tuning for Trustworthiness -- Balancing Performance and Explanation Consistency in Neural Network Optimization [49.567092222782435]
We introduce the novel concept of XAI consistency, defined as the agreement among different feature attribution methods.<n>We create a multi-objective optimization framework that balances predictive performance with explanation.<n>Our research provides a foundation for future investigations into whether models from the trade-off zone-balancing performance loss and XAI consistency-exhibit greater robustness.
arXiv Detail & Related papers (2025-05-12T13:19:14Z)
Sparse Mixture-of-Experts for Compositional Generalization: Empirical Evidence and Theoretical Foundations of Optimal Sparsity [89.81738321188391]
This study investigates the relationship between task complexity and optimal sparsity in SMoE models.<n>We show that the optimal sparsity lies between minimal activation (1-2 experts) and full activation, with the exact number scaling proportionally to task complexity.
arXiv Detail & Related papers (2024-10-17T18:40:48Z)
Improving Network Interpretability via Explanation Consistency Evaluation [56.14036428778861]
We propose a framework that acquires more explainable activation heatmaps and simultaneously increase the model performance. Specifically, our framework introduces a new metric, i.e., explanation consistency, to reweight the training samples adaptively in model learning. Our framework then promotes the model learning by paying closer attention to those training samples with a high difference in explanations.
arXiv Detail & Related papers (2024-08-08T17:20:08Z)
Mechanistic Design and Scaling of Hybrid Architectures [114.3129802943915]
We identify and test new hybrid architectures constructed from a variety of computational primitives. We experimentally validate the resulting architectures via an extensive compute-optimal and a new state-optimal scaling law analysis. We find MAD synthetics to correlate with compute-optimal perplexity, enabling accurate evaluation of new architectures.
arXiv Detail & Related papers (2024-03-26T16:33:12Z)
Neural Architecture Optimization with Graph VAE [21.126140965779534]
We propose an efficient NAS approach to optimize network architectures in a continuous space. The framework jointly learns four components: the encoder, the performance predictor, the complexity predictor and the decoder.
arXiv Detail & Related papers (2020-06-18T07:05:48Z)
A Semi-Supervised Assessor of Neural Architectures [157.76189339451565]
We employ an auto-encoder to discover meaningful representations of neural architectures. A graph convolutional neural network is introduced to predict the performance of architectures.
arXiv Detail & Related papers (2020-05-14T09:02:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.