Related papers: KAN we improve on HEP classification tasks? Kolmogorov-Arnold Networks applied to an LHC physics example

KAN we improve on HEP classification tasks? Kolmogorov-Arnold Networks applied to an LHC physics example

URL: http://arxiv.org/abs/2408.02743v1
Date: Mon, 5 Aug 2024 18:01:07 GMT
Title: KAN we improve on HEP classification tasks? Kolmogorov-Arnold Networks applied to an LHC physics example
Authors: Johannes Erdmann, Florian Mausolf, Jan Lukas Späh,
Abstract summary: Kolmogorov-Arnold Networks (KANs) have been proposed as an alternative to multilayer perceptrons. We study a typical binary event classification task in high-energy physics. We find that the learned activation functions of a one-layer KAN resemble the log-likelihood ratio of the input features.
Score: 0.08192907805418582
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recently, Kolmogorov-Arnold Networks (KANs) have been proposed as an alternative to multilayer perceptrons, suggesting advantages in performance and interpretability. We study a typical binary event classification task in high-energy physics including high-level features and comment on the performance and interpretability of KANs in this context. We find that the learned activation functions of a one-layer KAN resemble the log-likelihood ratio of the input features. In deeper KANs, the activations in the first KAN layer differ from those in the one-layer KAN, which indicates that the deeper KANs learn more complex representations of the data. We study KANs with different depths and widths and we compare them to multilayer perceptrons in terms of performance and number of trainable parameters. For the chosen classification task, we do not find that KANs are more parameter efficient. However, small KANs may offer advantages in terms of interpretability that come at the cost of only a moderate loss in performance.

Related papers

Layer-wise Quantization for Quantized Optimistic Dual Averaging [75.4148236967503]
We develop a general layer-wise quantization framework with tight variance and code-length bounds, adapting to the heterogeneities over the course of training.<n>We propose a novel Quantized Optimistic Dual Averaging (QODA) algorithm with adaptive learning rates, which achieves competitive convergence rates for monotone VIs.
arXiv Detail & Related papers (2025-05-20T13:53:58Z)
Semi-KAN: KAN Provides an Effective Representation for Semi-Supervised Learning in Medical Image Segmentation [2.717521115234258]
Semi-supervised medical image segmentation (SSMIS) offers a viable alternative to CNNs and ViTs. Inspired by Kolmogorov-Arnold Networks (KANs), we propose Semi-KAN. KANs exhibit superior representation learning capabilities with fewer parameters. We show that Semi-KAN surpasses baseline networks, utilizing fewer KAN layers and lower computational cost.
arXiv Detail & Related papers (2025-03-19T08:27:41Z)
LeanKAN: A Parameter-Lean Kolmogorov-Arnold Network Layer with Improved Memory Efficiency and Convergence Behavior [0.0]
The Kolmogorov-Arnold network (KAN) is a promising alternative to multi-layer perceptrons (MLPs) for data-driven modeling. Here, we find that MultKAN layers suffer from limited applicability in output layers. We propose LeanKANs, a direct and modular replacement for MultKAN and traditional AddKAN layers.
arXiv Detail & Related papers (2025-02-25T04:43:41Z)
Low Tensor-Rank Adaptation of Kolmogorov--Arnold Networks [70.06682043272377]
Kolmogorov--Arnold networks (KANs) have demonstrated their potential as an alternative to multi-layer perceptions (MLPs) in various domains. We develop low tensor-rank adaptation (LoTRA) for fine-tuning KANs. We explore the application of LoTRA for efficiently solving various partial differential equations (PDEs) by fine-tuning KANs.
arXiv Detail & Related papers (2025-02-10T04:57:07Z)
PRKAN: Parameter-Reduced Kolmogorov-Arnold Networks [47.947045173329315]
Kolmogorov-Arnold Networks (KANs) represent an innovation in neural network architectures. KANs offer a compelling alternative to Multi-Layer Perceptrons (MLPs) in models such as CNNs, RecurrentReduced Networks (RNNs) and Transformers. This paper introduces PRKANs, which employ several methods to reduce the parameter count in layers, making them comparable to Neural M layers.
arXiv Detail & Related papers (2025-01-13T03:07:39Z)
Exploring Kolmogorov-Arnold Networks for Interpretable Time Series Classification [0.17999333451993949]
Kolmogorov-Arnold Networks (KANs) have been proposed as a more interpretable alternative to state-of-the-art models. In this paper, we aim to conduct a comprehensive and robust exploration of the KAN architecture for time series classification. Our results show that (1) Efficient KAN outperforms in performance and computational efficiency, showcasing its suitability for tasks classification tasks.
arXiv Detail & Related papers (2024-11-22T13:01:36Z)
On the Convergence of (Stochastic) Gradient Descent for Kolmogorov--Arnold Networks [56.78271181959529]
Kolmogorov--Arnold Networks (KANs) have gained significant attention in the deep learning community. Empirical investigations demonstrate that KANs optimized via gradient descent (SGD) are capable of achieving near-zero training loss.
arXiv Detail & Related papers (2024-10-10T15:34:10Z)
A preliminary study on continual learning in computer vision using Kolmogorov-Arnold Networks [43.70716358136333]
Kolmogorov- Networks (KAN) are based on a fundamentally different mathematical framework. KANs address several major issues insio, such as forgetting in continual learning scenarios. We extend the investigation by evaluating the performance of KANs in continual learning tasks within computer vision.
arXiv Detail & Related papers (2024-09-20T14:49:21Z)
Activation Space Selectable Kolmogorov-Arnold Networks [29.450377034478933]
Kolmogorov-Arnold Network (KAN), based on nonlinear additive connections, has been proven to achieve performance comparable to Select-based methods. Despite this potential, the use of a single activation function space results in reduced performance of KAN and related works across different tasks. This work contributes to the understanding of the data-centric design of new AI and provides a foundational reference for innovations in KAN-based network architectures.
arXiv Detail & Related papers (2024-08-15T11:34:05Z)
Rethinking the Function of Neurons in KANs [1.223779595809275]
The neurons of Kolmogorov-Arnold Networks (KANs) perform a simple summation motivated by the Kolmogorov-Arnold representation theorem. In this work, we investigate the potential for identifying an alternative multivariate function for KAN neurons that may offer increased practical utility.
arXiv Detail & Related papers (2024-07-30T09:04:23Z)
SineKAN: Kolmogorov-Arnold Networks Using Sinusoidal Activation Functions [0.0]
We present a model in which learnable grids of B-Spline activation functions are replaced by grids of re-weighted sine functions (SineKAN) We show that our model can perform better than or comparable to B-Spline KAN models and an alternative KAN implementation based on periodic cosine and sine functions.
arXiv Detail & Related papers (2024-07-04T20:53:19Z)
U-KAN Makes Strong Backbone for Medical Image Segmentation and Generation [48.40120035775506]
Kolmogorov-Arnold Networks (KANs) reshape the neural network learning via the stack of non-linear learnable activation functions. We investigate, modify and re-design the established U-Net pipeline by integrating the dedicated KAN layers on the tokenized intermediate representation, termed U-KAN. We further delved into the potential of U-KAN as an alternative U-Net noise predictor in diffusion models, demonstrating its applicability in generating task-oriented model architectures.
arXiv Detail & Related papers (2024-06-05T04:13:03Z)
Kernel function impact on convolutional neural networks [10.98068123467568]
We study the usage of kernel functions at the different layers in a convolutional neural network. We show how one can effectively leverage kernel functions, by introducing a more distortion aware pooling layers. We propose Kernelized Dense Layers (KDL), which replace fully-connected layers.
arXiv Detail & Related papers (2023-02-20T19:57:01Z)
WLD-Reg: A Data-dependent Within-layer Diversity Regularizer [98.78384185493624]
Neural networks are composed of multiple layers arranged in a hierarchical structure jointly trained with a gradient-based optimization. We propose to complement this traditional 'between-layer' feedback with additional 'within-layer' feedback to encourage the diversity of the activations within the same layer. We present an extensive empirical study confirming that the proposed approach enhances the performance of several state-of-the-art neural network models in multiple tasks.
arXiv Detail & Related papers (2023-01-03T20:57:22Z)
Improved Convergence Guarantees for Shallow Neural Networks [91.3755431537592]
We prove convergence of depth 2 neural networks, trained via gradient descent, to a global minimum. Our model has the following features: regression with quadratic loss function, fully connected feedforward architecture, RelU activations, Gaussian data instances, adversarial labels. They strongly suggest that, at least in our model, the convergence phenomenon extends well beyond the NTK regime''
arXiv Detail & Related papers (2022-12-05T14:47:52Z)
Learning distinct features helps, provably [98.78384185493624]
We study the diversity of the features learned by a two-layer neural network trained with the least squares loss. We measure the diversity by the average $L$-distance between the hidden-layer features.
arXiv Detail & Related papers (2021-06-10T19:14:45Z)
Fitting the Search Space of Weight-sharing NAS with Graph Convolutional Networks [100.14670789581811]
We train a graph convolutional network to fit the performance of sampled sub-networks. With this strategy, we achieve a higher rank correlation coefficient in the selected set of candidates.
arXiv Detail & Related papers (2020-04-17T19:12:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.