Demonstrating the Efficacy of Kolmogorov-Arnold Networks in Vision Tasks
- URL: http://arxiv.org/abs/2406.14916v1
- Date: Fri, 21 Jun 2024 07:20:34 GMT
- Title: Demonstrating the Efficacy of Kolmogorov-Arnold Networks in Vision Tasks
- Authors: Minjong Cheon,
- Abstract summary: The Kolmogorov-Arnold Network (KAN) has emerged as a potential alternative to multilayer projections (MLPs)
In our study, we demonstrated the effectiveness of KAN for vision tasks through multiple trials on the MNIST, CIFAR10, and CIFAR100.
These findings suggest that KAN holds significant promise for vision tasks, and further modifications could enhance its performance in future evaluations.
- Score: 4.8951183832371
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In the realm of deep learning, the Kolmogorov-Arnold Network (KAN) has emerged as a potential alternative to multilayer projections (MLPs). However, its applicability to vision tasks has not been extensively validated. In our study, we demonstrated the effectiveness of KAN for vision tasks through multiple trials on the MNIST, CIFAR10, and CIFAR100 datasets, using a training batch size of 32. Our results showed that while KAN outperformed the original MLP-Mixer on CIFAR10 and CIFAR100, it performed slightly worse than the state-of-the-art ResNet-18. These findings suggest that KAN holds significant promise for vision tasks, and further modifications could enhance its performance in future evaluations.Our contributions are threefold: first, we showcase the efficiency of KAN-based algorithms for visual tasks; second, we provide extensive empirical assessments across various vision benchmarks, comparing KAN's performance with MLP-Mixer, CNNs, and Vision Transformers (ViT); and third, we pioneer the use of natural KAN layers in visual tasks, addressing a gap in previous research. This paper lays the foundation for future studies on KANs, highlighting their potential as a reliable alternative for image classification tasks.
Related papers
- Kolmogorov-Arnold Network Autoencoders [0.0]
Kolmogorov-Arnold Networks (KANs) are promising alternatives to Multi-Layer Perceptrons (MLPs)
KANs align closely with the Kolmogorov-Arnold representation theorem, potentially enhancing both model accuracy and interpretability.
Our results demonstrate that KAN-based autoencoders achieve competitive performance in terms of reconstruction accuracy.
arXiv Detail & Related papers (2024-10-02T22:56:00Z) - Improving Neural Surface Reconstruction with Feature Priors from Multi-View Image [87.00660347447494]
Recent advancements in Neural Surface Reconstruction (NSR) have significantly improved multi-view reconstruction when coupled with volume rendering.
We propose an investigation into feature-level consistent loss, aiming to harness valuable feature priors from diverse pretext visual tasks.
Our results, analyzed on DTU and EPFL, reveal that feature priors from image matching and multi-view stereo datasets outperform other pretext tasks.
arXiv Detail & Related papers (2024-08-04T16:09:46Z) - NeRF Director: Revisiting View Selection in Neural Volume Rendering [21.03892888687864]
We introduce a unified framework for view selection methods and devise a benchmark to assess its impact.
We show that high-quality renderings can be achieved faster by using fewer views.
We conduct extensive experiments on both synthetic datasets and realistic data to demonstrate the effectiveness of our proposed method.
arXiv Detail & Related papers (2024-06-13T06:04:19Z) - Kolmogorov-Arnold Network for Satellite Image Classification in Remote Sensing [4.8951183832371]
We propose the first approach for integrating the Kolmogorov-Arnold Network (KAN) with pre-trained Convolutional Neural Network (CNN) models for remote sensing scene classification tasks.
Our novel methodology, named KCN, aims to replace traditional Multi-Layer Perceptrons (MLPs) with KAN to enhance classification performance.
We employed multiple CNN-based models, including VGG16, MobileNetV2, EfficientNet, ConvNeXt, ResNet101, and Vision Transformer (ViT), and evaluated their performance when paired with KAN.
arXiv Detail & Related papers (2024-06-02T03:11:37Z) - VeCAF: Vision-language Collaborative Active Finetuning with Training Objective Awareness [56.87603097348203]
VeCAF uses labels and natural language annotations to perform parametric data selection for PVM finetuning.
VeCAF incorporates the finetuning objective to select significant data points that effectively guide the PVM towards faster convergence.
On ImageNet, VeCAF uses up to 3.3x less training batches to reach the target performance compared to full finetuning.
arXiv Detail & Related papers (2024-01-15T17:28:37Z) - MOODv2: Masked Image Modeling for Out-of-Distribution Detection [57.17163962383442]
This study explores distinct pretraining tasks and employing various OOD score functions.
Our framework, MOODv2, impressively enhances 14.30% AUROC to 95.68% on ImageNet and achieves 99.98% on CIFAR-10.
arXiv Detail & Related papers (2024-01-05T02:57:58Z) - Unveiling Backbone Effects in CLIP: Exploring Representational Synergies
and Variances [49.631908848868505]
Contrastive Language-Image Pretraining (CLIP) stands out as a prominent method for image representation learning.
We investigate the differences in CLIP performance among various neural architectures.
We propose a simple, yet effective approach to combine predictions from multiple backbones, leading to a notable performance boost of up to 6.34%.
arXiv Detail & Related papers (2023-12-22T03:01:41Z) - RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering
Assisted Distillation [50.35403070279804]
3D occupancy prediction is an emerging task that aims to estimate the occupancy states and semantics of 3D scenes using multi-view images.
We propose RadOcc, a Rendering assisted distillation paradigm for 3D Occupancy prediction.
arXiv Detail & Related papers (2023-12-19T03:39:56Z) - Efficient Self-supervised Vision Transformers for Representation
Learning [86.57557009109411]
We show that multi-stage architectures with sparse self-attentions can significantly reduce modeling complexity.
We propose a new pre-training task of region matching which allows the model to capture fine-grained region dependencies.
Our results show that combining the two techniques, EsViT achieves 81.3% top-1 on the ImageNet linear probe evaluation.
arXiv Detail & Related papers (2021-06-17T19:57:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.