Related papers: Multi-Exit Kolmogorov-Arnold Networks: enhancing accuracy and parsimony

Multi-Exit Kolmogorov-Arnold Networks: enhancing accuracy and parsimony

URL: http://arxiv.org/abs/2506.03302v1
Date: Tue, 03 Jun 2025 18:41:30 GMT
Title: Multi-Exit Kolmogorov-Arnold Networks: enhancing accuracy and parsimony
Authors: James Bagrow, Josh Bongard,
Abstract summary: Kolmogorov-Arnold Networks (KANs) combine high accuracy with interpretability, making them valuable for scientific modeling.<n>Here we introduce multi-exit KANs, where each layer includes its own prediction branch, enabling the network to make accurate predictions at multiple depths simultaneously.<n>This architecture provides deep supervision that improves training while discovering the right level of model complexity for each task.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Kolmogorov-Arnold Networks (KANs) uniquely combine high accuracy with interpretability, making them valuable for scientific modeling. However, it is unclear a priori how deep a network needs to be for any given task, and deeper KANs can be difficult to optimize. Here we introduce multi-exit KANs, where each layer includes its own prediction branch, enabling the network to make accurate predictions at multiple depths simultaneously. This architecture provides deep supervision that improves training while discovering the right level of model complexity for each task. Multi-exit KANs consistently outperform standard, single-exit versions on synthetic functions, dynamical systems, and real-world datasets. Remarkably, the best predictions often come from earlier, simpler exits, revealing that these networks naturally identify smaller, more parsimonious and interpretable models without sacrificing accuracy. To automate this discovery, we develop a differentiable "learning to exit" algorithm that balances contributions from exits during training. Our approach offers scientists a practical way to achieve both high performance and interpretability, addressing a fundamental challenge in machine learning for scientific discovery.

Related papers

Statistical physics of deep learning: Optimal learning of a multi-layer perceptron near interpolation [7.079039376205091]
We study the supervised learning of a multi-layer perceptron.<n>We focus on the challenging regime where the number of trainable parameters and data are comparable.<n>Despite its simplicity, the Bayesian-optimal setting provides insights on how the depth, non-linearity and finite (proportional) width influence neural networks.
arXiv Detail & Related papers (2025-10-28T16:44:34Z)
Global Convergence and Rich Feature Learning in $L$-Layer Infinite-Width Neural Networks under $μ$P Parametrization [66.03821840425539]
In this paper, we investigate the training dynamics of $L$-layer neural networks using the tensor gradient program (SGD) framework.<n>We show that SGD enables these networks to learn linearly independent features that substantially deviate from their initial values.<n>This rich feature space captures relevant data information and ensures that any convergent point of the training process is a global minimum.
arXiv Detail & Related papers (2025-03-12T17:33:13Z)
Tackling the Accuracy-Interpretability Trade-off in a Hierarchy of Machine Learning Models for the Prediction of Extreme Heatwaves [41.94295877935867]
We perform probabilistic forecasts of extreme heatwaves over France using a hierarchy of increasingly complex Machine Learning models.<n>CNNs provide higher accuracy, but their black-box nature severely limits interpretability.<n>ScatNet achieves similar performance to CNNs while providing greater transparency.
arXiv Detail & Related papers (2024-10-01T18:15:04Z)
Improving Network Interpretability via Explanation Consistency Evaluation [56.14036428778861]
We propose a framework that acquires more explainable activation heatmaps and simultaneously increase the model performance. Specifically, our framework introduces a new metric, i.e., explanation consistency, to reweight the training samples adaptively in model learning. Our framework then promotes the model learning by paying closer attention to those training samples with a high difference in explanations.
arXiv Detail & Related papers (2024-08-08T17:20:08Z)
Neural Network Pruning by Gradient Descent [7.427858344638741]
We introduce a novel and straightforward neural network pruning framework that incorporates the Gumbel-Softmax technique. We demonstrate its exceptional compression capability, maintaining high accuracy on the MNIST dataset with only 0.15% of the original network parameters. We believe our method opens a promising new avenue for deep learning pruning and the creation of interpretable machine learning systems.
arXiv Detail & Related papers (2023-11-21T11:12:03Z)
Contextualizing MLP-Mixers Spatiotemporally for Urban Data Forecast at Scale [54.15522908057831]
We propose an adapted version of the computationally-Mixer for STTD forecast at scale. Our results surprisingly show that this simple-yeteffective solution can rival SOTA baselines when tested on several traffic benchmarks. Our findings contribute to the exploration of simple-yet-effective models for real-world STTD forecasting.
arXiv Detail & Related papers (2023-07-04T05:19:19Z)
Higher-order accurate two-sample network inference and network hashing [13.984114642035692]
Two-sample hypothesis testing for network comparison presents many significant challenges. We develop a comprehensive toolbox featuring a novel main method and its variants. Our method outperforms existing tools in speed and accuracy, and it is proved power-optimal.
arXiv Detail & Related papers (2022-08-16T07:31:11Z)
FiLM-Ensemble: Probabilistic Deep Learning via Feature-wise Linear Modulation [69.34011200590817]
We introduce FiLM-Ensemble, a deep, implicit ensemble method based on the concept of Feature-wise Linear Modulation. By modulating the network activations of a single deep network with FiLM, one obtains a model ensemble with high diversity. We show that FiLM-Ensemble outperforms other implicit ensemble methods, and it comes very close to the upper bound of an explicit ensemble of networks.
arXiv Detail & Related papers (2022-05-31T18:33:15Z)
Deep Architecture Connectivity Matters for Its Convergence: A Fine-Grained Analysis [94.64007376939735]
We theoretically characterize the impact of connectivity patterns on the convergence of deep neural networks (DNNs) under gradient descent training. We show that by a simple filtration on "unpromising" connectivity patterns, we can trim down the number of models to evaluate.
arXiv Detail & Related papers (2022-05-11T17:43:54Z)
AutoDEUQ: Automated Deep Ensemble with Uncertainty Quantification [0.9449650062296824]
We propose AutoDEUQ, an automated approach for generating an ensemble of deep neural networks. We show that AutoDEUQ outperforms probabilistic backpropagation, Monte Carlo dropout, deep ensemble, distribution-free ensembles, and hyper ensemble methods on a number of regression benchmarks.
arXiv Detail & Related papers (2021-10-26T09:12:23Z)
IQNAS: Interpretable Integer Quadratic Programming Neural Architecture Search [40.77061519007659]
A popular approach to find fitting networks is through constrained Neural Architecture Search (NAS) Previous methods use complicated predictors for the accuracy of the network. We introduce Interpretable Quadratic programming Neural Architecture Search (IQNAS)
arXiv Detail & Related papers (2021-10-24T09:45:00Z)
NAS-Navigator: Visual Steering for Explainable One-Shot Deep Neural Network Synthesis [53.106414896248246]
We present a framework that allows analysts to effectively build the solution sub-graph space and guide the network search by injecting their domain knowledge. Applying this technique in an iterative manner allows analysts to converge to the best performing neural network architecture for a given application.
arXiv Detail & Related papers (2020-09-28T01:48:45Z)
Fitting the Search Space of Weight-sharing NAS with Graph Convolutional Networks [100.14670789581811]
We train a graph convolutional network to fit the performance of sampled sub-networks. With this strategy, we achieve a higher rank correlation coefficient in the selected set of candidates.
arXiv Detail & Related papers (2020-04-17T19:12:39Z)
Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction. We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data. Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.