SineKAN: Kolmogorov-Arnold Networks Using Sinusoidal Activation Functions
- URL: http://arxiv.org/abs/2407.04149v3
- Date: Fri, 24 Jan 2025 20:27:46 GMT
- Title: SineKAN: Kolmogorov-Arnold Networks Using Sinusoidal Activation Functions
- Authors: Eric A. F. Reinhardt, P. R. Dinesh, Sergei Gleyzer,
- Abstract summary: We present a model in which learnable grids of B-Spline activation functions are replaced by grids of re-weighted sine functions (SineKAN)
We show that our model can perform better than or comparable to B-Spline KAN models and an alternative KAN implementation based on periodic cosine and sine functions.
- Score: 0.0
- License:
- Abstract: Recent work has established an alternative to traditional multi-layer perceptron neural networks in the form of Kolmogorov-Arnold Networks (KAN). The general KAN framework uses learnable activation functions on the edges of the computational graph followed by summation on nodes. The learnable edge activation functions in the original implementation are basis spline functions (B-Spline). Here, we present a model in which learnable grids of B-Spline activation functions are replaced by grids of re-weighted sine functions (SineKAN). We evaluate numerical performance of our model on a benchmark vision task. We show that our model can perform better than or comparable to B-Spline KAN models and an alternative KAN implementation based on periodic cosine and sine functions representing a Fourier Series. Further, we show that SineKAN has numerical accuracy that could scale comparably to dense neural networks (DNNs). Compared to the two baseline KAN models, SineKAN achieves a substantial speed increase at all hidden layer sizes, batch sizes, and depths. Current advantage of DNNs due to hardware and software optimizations are discussed along with theoretical scaling. Additionally, properties of SineKAN compared to other KAN implementations and current limitations are also discussed
Related papers
- KA-GNN: Kolmogorov-Arnold Graph Neural Networks for Molecular Property Prediction [16.53371673077183]
We propose the first non-trivial Kolmogorov-Arnold Network-based Graph Neural Networks (KA-GNNs)
The essential idea is to utilize KAN's unique power to optimize GNN architectures at three major levels, including node embedding, message passing, and readout.
It has been found that our KA-GNNs can outperform traditional GNN models.
arXiv Detail & Related papers (2024-10-15T06:44:57Z) - Sinc Kolmogorov-Arnold Network and Its Applications on Physics-informed Neural Networks [4.61590049339329]
We propose to use Sinc in the context of Kolmogorov-Arnold Networks, neural networks with learnable activation functions.
We show that Sinc proposes a viable alternative, since it is known in numerical analysis to represent well both smooth functions and functions with singularities.
arXiv Detail & Related papers (2024-10-05T09:33:39Z) - U-KAN Makes Strong Backbone for Medical Image Segmentation and Generation [48.40120035775506]
Kolmogorov-Arnold Networks (KANs) reshape the neural network learning via the stack of non-linear learnable activation functions.
We investigate, modify and re-design the established U-Net pipeline by integrating the dedicated KAN layers on the tokenized intermediate representation, termed U-KAN.
We further delved into the potential of U-KAN as an alternative U-Net noise predictor in diffusion models, demonstrating its applicability in generating task-oriented model architectures.
arXiv Detail & Related papers (2024-06-05T04:13:03Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Simple initialization and parametrization of sinusoidal networks via
their kernel bandwidth [92.25666446274188]
sinusoidal neural networks with activations have been proposed as an alternative to networks with traditional activation functions.
We first propose a simplified version of such sinusoidal neural networks, which allows both for easier practical implementation and simpler theoretical analysis.
We then analyze the behavior of these networks from the neural tangent kernel perspective and demonstrate that their kernel approximates a low-pass filter with an adjustable bandwidth.
arXiv Detail & Related papers (2022-11-26T07:41:48Z) - EIGNN: Efficient Infinite-Depth Graph Neural Networks [51.97361378423152]
Graph neural networks (GNNs) are widely used for modelling graph-structured data in numerous applications.
Motivated by this limitation, we propose a GNN model with infinite depth, which we call Efficient Infinite-Depth Graph Neural Networks (EIGNN)
We show that EIGNN has a better ability to capture long-range dependencies than recent baselines, and consistently achieves state-of-the-art performance.
arXiv Detail & Related papers (2022-02-22T08:16:58Z) - Scaling Properties of Deep Residual Networks [2.6763498831034043]
We investigate the properties of weights trained by gradient descent and their scaling with network depth through numerical experiments.
We observe the existence of scaling regimes markedly different from those assumed in neural ODE literature.
These findings cast doubts on the validity of the neural ODE model as an adequate description of deep ResNets.
arXiv Detail & Related papers (2021-05-25T22:31:30Z) - Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network.
We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv Detail & Related papers (2021-04-03T09:08:12Z) - Delay Differential Neural Networks [0.2538209532048866]
We propose a novel model, delay differential neural networks (DDNN), inspired by delay differential equations (DDEs)
For training DDNNs, we provide a memory-efficient adjoint method for computing gradients and back-propagate through the network.
Experiments conducted on synthetic and real-world image classification datasets such as Cifar10 and Cifar100 show the effectiveness of the proposed models.
arXiv Detail & Related papers (2020-12-12T12:20:54Z) - Binarized Graph Neural Network [65.20589262811677]
We develop a binarized graph neural network to learn the binary representations of the nodes with binary network parameters.
Our proposed method can be seamlessly integrated into the existing GNN-based embedding approaches.
Experiments indicate that the proposed binarized graph neural network, namely BGN, is orders of magnitude more efficient in terms of both time and space.
arXiv Detail & Related papers (2020-04-19T09:43:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.