SineKAN: Kolmogorov-Arnold Networks Using Sinusoidal Activation Functions
- URL: http://arxiv.org/abs/2407.04149v3
- Date: Fri, 24 Jan 2025 20:27:46 GMT
- Title: SineKAN: Kolmogorov-Arnold Networks Using Sinusoidal Activation Functions
- Authors: Eric A. F. Reinhardt, P. R. Dinesh, Sergei Gleyzer,
- Abstract summary: We present a model in which learnable grids of B-Spline activation functions are replaced by grids of re-weighted sine functions (SineKAN)<n>We show that our model can perform better than or comparable to B-Spline KAN models and an alternative KAN implementation based on periodic cosine and sine functions.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent work has established an alternative to traditional multi-layer perceptron neural networks in the form of Kolmogorov-Arnold Networks (KAN). The general KAN framework uses learnable activation functions on the edges of the computational graph followed by summation on nodes. The learnable edge activation functions in the original implementation are basis spline functions (B-Spline). Here, we present a model in which learnable grids of B-Spline activation functions are replaced by grids of re-weighted sine functions (SineKAN). We evaluate numerical performance of our model on a benchmark vision task. We show that our model can perform better than or comparable to B-Spline KAN models and an alternative KAN implementation based on periodic cosine and sine functions representing a Fourier Series. Further, we show that SineKAN has numerical accuracy that could scale comparably to dense neural networks (DNNs). Compared to the two baseline KAN models, SineKAN achieves a substantial speed increase at all hidden layer sizes, batch sizes, and depths. Current advantage of DNNs due to hardware and software optimizations are discussed along with theoretical scaling. Additionally, properties of SineKAN compared to other KAN implementations and current limitations are also discussed
Related papers
- KA-GNN: Kolmogorov-Arnold Graph Neural Networks for Molecular Property Prediction [16.53371673077183]
We propose the first non-trivial Kolmogorov-Arnold Network-based Graph Neural Networks (KA-GNNs)
The essential idea is to utilize KAN's unique power to optimize GNN architectures at three major levels, including node embedding, message passing, and readout.
It has been found that our KA-GNNs can outperform traditional GNN models.
arXiv Detail & Related papers (2024-10-15T06:44:57Z) - Sinc Kolmogorov-Arnold Network and Its Applications on Physics-informed Neural Networks [4.61590049339329]
We propose to use Sinc in the context of Kolmogorov-Arnold Networks, neural networks with learnable activation functions.
We show that Sinc proposes a viable alternative, since it is known in numerical analysis to represent well both smooth functions and functions with singularities.
arXiv Detail & Related papers (2024-10-05T09:33:39Z) - Convolutional Kolmogorov-Arnold Networks [41.94295877935867]
We present Convolutional Kolmogorov-Arnold Networks (KANs)
KANs replace traditional fixed-weight kernels with learnable non-linear functions.
We empirically evaluate Convolutional KANs on the Fashion-MNIST dataset, demonstrating competitive accuracy with up to 50% fewer parameters compared to baseline CNNs.
arXiv Detail & Related papers (2024-06-19T02:09:44Z) - U-KAN Makes Strong Backbone for Medical Image Segmentation and Generation [48.40120035775506]
Kolmogorov-Arnold Networks (KANs) reshape the neural network learning via the stack of non-linear learnable activation functions.
We investigate, modify and re-design the established U-Net pipeline by integrating the dedicated KAN layers on the tokenized intermediate representation, termed U-KAN.
We further delved into the potential of U-KAN as an alternative U-Net noise predictor in diffusion models, demonstrating its applicability in generating task-oriented model architectures.
arXiv Detail & Related papers (2024-06-05T04:13:03Z) - Approximation of RKHS Functionals by Neural Networks [30.42446856477086]
We study the approximation of functionals on kernel reproducing Hilbert spaces (RKHS's) using neural networks.
We derive explicit error bounds for those induced by inverse multiquadric, Gaussian, and Sobolev kernels.
We apply our findings to functional regression, proving that neural networks can accurately approximate the regression maps.
arXiv Detail & Related papers (2024-03-18T18:58:23Z) - ENN: A Neural Network with DCT Adaptive Activation Functions [2.2713084727838115]
We present Expressive Neural Network (ENN), a novel model in which the non-linear activation functions are modeled using the Discrete Cosine Transform (DCT)
This parametrization keeps the number of trainable parameters low, is appropriate for gradient-based schemes, and adapts to different learning tasks.
The performance of ENN outperforms state of the art benchmarks, providing above a 40% gap in accuracy in some scenarios.
arXiv Detail & Related papers (2023-07-02T21:46:30Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Simple initialization and parametrization of sinusoidal networks via
their kernel bandwidth [92.25666446274188]
sinusoidal neural networks with activations have been proposed as an alternative to networks with traditional activation functions.
We first propose a simplified version of such sinusoidal neural networks, which allows both for easier practical implementation and simpler theoretical analysis.
We then analyze the behavior of these networks from the neural tangent kernel perspective and demonstrate that their kernel approximates a low-pass filter with an adjustable bandwidth.
arXiv Detail & Related papers (2022-11-26T07:41:48Z) - EIGNN: Efficient Infinite-Depth Graph Neural Networks [51.97361378423152]
Graph neural networks (GNNs) are widely used for modelling graph-structured data in numerous applications.
Motivated by this limitation, we propose a GNN model with infinite depth, which we call Efficient Infinite-Depth Graph Neural Networks (EIGNN)
We show that EIGNN has a better ability to capture long-range dependencies than recent baselines, and consistently achieves state-of-the-art performance.
arXiv Detail & Related papers (2022-02-22T08:16:58Z) - Otimizacao de pesos e funcoes de ativacao de redes neurais aplicadas na
previsao de series temporais [0.0]
We propose the use of a family of free parameter asymmetric activation functions for neural networks.
We show that this family of defined activation functions satisfies the requirements of the universal approximation theorem.
A methodology for the global optimization of this family of activation functions with free parameter and the weights of the connections between the processing units of the neural network is used.
arXiv Detail & Related papers (2021-07-29T23:32:15Z) - Compressing Deep ODE-Nets using Basis Function Expansions [105.05435207079759]
We consider formulations of the weights as continuous-depth functions using linear combinations of basis functions.
This perspective allows us to compress the weights through a change of basis, without retraining, while maintaining near state-of-the-art performance.
In turn, both inference time and the memory footprint are reduced, enabling quick and rigorous adaptation between computational environments.
arXiv Detail & Related papers (2021-06-21T03:04:51Z) - Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network.
We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv Detail & Related papers (2021-04-03T09:08:12Z) - Delay Differential Neural Networks [0.2538209532048866]
We propose a novel model, delay differential neural networks (DDNN), inspired by delay differential equations (DDEs)
For training DDNNs, we provide a memory-efficient adjoint method for computing gradients and back-propagate through the network.
Experiments conducted on synthetic and real-world image classification datasets such as Cifar10 and Cifar100 show the effectiveness of the proposed models.
arXiv Detail & Related papers (2020-12-12T12:20:54Z) - On the spatial attention in Spatio-Temporal Graph Convolutional Networks
for skeleton-based human action recognition [97.14064057840089]
Graphal networks (GCNs) promising performance in skeleton-based human action recognition by modeling a sequence of skeletons as a graph.
Most of the recently proposed G-temporal-based methods improve the performance by learning the graph structure at each layer of the network.
arXiv Detail & Related papers (2020-11-07T19:03:04Z) - Training End-to-End Analog Neural Networks with Equilibrium Propagation [64.0476282000118]
We introduce a principled method to train end-to-end analog neural networks by gradient descent.
We show mathematically that a class of analog neural networks (called nonlinear resistive networks) are energy-based models.
Our work can guide the development of a new generation of ultra-fast, compact and low-power neural networks supporting on-chip learning.
arXiv Detail & Related papers (2020-06-02T23:38:35Z) - Activation functions are not needed: the ratio net [3.9636371287541086]
This paper focus on designing a new function approximator.
Instead of designing new activation functions or kernel functions, the new proposed network uses the fractional form.
It shows that, in most cases, the ratio net converges faster and outperforms both the classification and the RBF.
arXiv Detail & Related papers (2020-05-14T01:07:56Z) - Binarized Graph Neural Network [65.20589262811677]
We develop a binarized graph neural network to learn the binary representations of the nodes with binary network parameters.
Our proposed method can be seamlessly integrated into the existing GNN-based embedding approaches.
Experiments indicate that the proposed binarized graph neural network, namely BGN, is orders of magnitude more efficient in terms of both time and space.
arXiv Detail & Related papers (2020-04-19T09:43:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.