seqKAN: Sequence processing with Kolmogorov-Arnold Networks
- URL: http://arxiv.org/abs/2502.14681v1
- Date: Thu, 20 Feb 2025 16:10:18 GMT
- Title: seqKAN: Sequence processing with Kolmogorov-Arnold Networks
- Authors: Tatiana Boura, Stasinos Konstantopoulos,
- Abstract summary: Kolmogorov-Arnold Networks (KANs) have been recently proposed as a machine learning framework that is more interpretable and controllable than the multi-layer perceptron.
This paper proposes seqKAN, a new KAN architecture for sequence processing.
- Score: 0.0
- License:
- Abstract: Kolmogorov-Arnold Networks (KANs) have been recently proposed as a machine learning framework that is more interpretable and controllable than the multi-layer perceptron. Various network architectures have been proposed within the KAN framework targeting different tasks and application domains, including sequence processing. This paper proposes seqKAN, a new KAN architecture for sequence processing. Although multiple sequence processing KAN architectures have already been proposed, we argue that seqKAN is more faithful to the core concept of the KAN framework. Furthermore, we empirically demonstrate that it achieves better results. The empirical evaluation is performed on generated data from a complex physics problem on an interpolation and an extrapolation task. Using this dataset we compared seqKAN against a prior KAN network for timeseries prediction, recurrent deep networks, and symbolic regression. seqKAN substantially outperforms all architectures, particularly on the extrapolation dataset, while also being the most transparent.
Related papers
- HKAN: Hierarchical Kolmogorov-Arnold Network without Backpropagation [1.3812010983144802]
The Hierarchical Kolmogorov-Arnold Network (HKAN) is a novel network architecture that offers a competitive alternative to the recently proposed Kolmogorov-Arnold Network (KAN)
HKAN adopts a randomized learning approach, where the parameters of its basis functions are fixed, and linear aggregations are optimized using least-squares regression.
Empirical results show that HKAN delivers comparable, if not superior, accuracy and stability relative to KAN across various regression tasks, while also providing insights into variable importance.
arXiv Detail & Related papers (2025-01-30T08:44:54Z) - TKAN: Temporal Kolmogorov-Arnold Networks [0.0]
Long Short-Term Memory (LSTM) has demonstrated its ability to capture long-term dependencies in sequential data.
Inspired by the Kolmogorov-Arnold Networks (KANs) a promising alternatives to Multi-Layer Perceptrons (MLPs)
We propose a new neural networks architecture inspired by KAN and the LSTM, the Temporal Kolomogorov-Arnold Networks (TKANs)
arXiv Detail & Related papers (2024-05-12T17:40:48Z) - Defining Neural Network Architecture through Polytope Structures of Dataset [53.512432492636236]
This paper defines upper and lower bounds for neural network widths, which are informed by the polytope structure of the dataset in question.
We develop an algorithm to investigate a converse situation where the polytope structure of a dataset can be inferred from its corresponding trained neural networks.
It is established that popular datasets such as MNIST, Fashion-MNIST, and CIFAR10 can be efficiently encapsulated using no more than two polytopes with a small number of faces.
arXiv Detail & Related papers (2024-02-04T08:57:42Z) - Set-based Neural Network Encoding Without Weight Tying [91.37161634310819]
We propose a neural network weight encoding method for network property prediction.
Our approach is capable of encoding neural networks in a model zoo of mixed architecture.
We introduce two new tasks for neural network property prediction: cross-dataset and cross-architecture.
arXiv Detail & Related papers (2023-05-26T04:34:28Z) - Pushing the Efficiency Limit Using Structured Sparse Convolutions [82.31130122200578]
We propose Structured Sparse Convolution (SSC), which leverages the inherent structure in images to reduce the parameters in the convolutional filter.
We show that SSC is a generalization of commonly used layers (depthwise, groupwise and pointwise convolution) in efficient architectures''
Architectures based on SSC achieve state-of-the-art performance compared to baselines on CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet classification benchmarks.
arXiv Detail & Related papers (2022-10-23T18:37:22Z) - Contextualized Embeddings based Convolutional Neural Networks for
Duplicate Question Identification [0.25782420501870296]
Question Paraphrase Identification (QPI) is a critical task for large-scale Question-Answering forums.
We propose a novel architecture combining a Bidirectional Transformer with Convolutional Neural Networks for the QPI task.
Experimental results demonstrate that our model achieves state-of-the-art performance on the Quora Question Pairs dataset.
arXiv Detail & Related papers (2021-09-03T14:30:09Z) - Landmark Regularization: Ranking Guided Super-Net Training in Neural
Architecture Search [70.57382341642418]
Weight sharing has become a de facto standard in neural architecture search because it enables the search to be done on commodity hardware.
Recent works have empirically shown a ranking disorder between the performance of stand-alone architectures and that of the corresponding shared-weight networks.
We propose a regularization term that aims to maximize the correlation between the performance rankings of the shared-weight network and that of the standalone architectures.
arXiv Detail & Related papers (2021-04-12T09:32:33Z) - DC-NAS: Divide-and-Conquer Neural Architecture Search [108.57785531758076]
We present a divide-and-conquer (DC) approach to effectively and efficiently search deep neural architectures.
We achieve a $75.1%$ top-1 accuracy on the ImageNet dataset, which is higher than that of state-of-the-art methods using the same search space.
arXiv Detail & Related papers (2020-05-29T09:02:16Z) - Fitting the Search Space of Weight-sharing NAS with Graph Convolutional
Networks [100.14670789581811]
We train a graph convolutional network to fit the performance of sampled sub-networks.
With this strategy, we achieve a higher rank correlation coefficient in the selected set of candidates.
arXiv Detail & Related papers (2020-04-17T19:12:39Z) - Anytime Inference with Distilled Hierarchical Neural Ensembles [32.003196185519]
Inference in deep neural networks can be computationally expensive, and networks capable of anytime inference are important in mscenarios where the amount of compute or quantity of input data varies over time.
We propose Hierarchical Neural Ensembles (HNE), a novel framework to embed an ensemble of multiple networks in a hierarchical tree structure, sharing intermediate layers.
Our experiments show that, compared to previous anytime inference models, HNE provides state-of-the-art accuracy-computate trade-offs on the CIFAR-10/100 and ImageNet datasets.
arXiv Detail & Related papers (2020-03-03T12:13:38Z) - Residual Attention Net for Superior Cross-Domain Time Sequence Modeling [0.0]
This paper serves as a proof-of-concept for a new architecture, with RAN aiming at providing the model a higher level understanding of sequence patterns.
We have achieved 35 state-of-the-art results with 10 results matching current state-of-the-art results without further model fine-tuning.
arXiv Detail & Related papers (2020-01-13T06:14:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.