Hyb-KAN ViT: Hybrid Kolmogorov-Arnold Networks Augmented Vision Transformer
- URL: http://arxiv.org/abs/2505.04740v1
- Date: Wed, 07 May 2025 19:13:17 GMT
- Title: Hyb-KAN ViT: Hybrid Kolmogorov-Arnold Networks Augmented Vision Transformer
- Authors: Sainath Dey, Mitul Goswami, Jashika Sethi, Prasant Kumar Pattnaik,
- Abstract summary: This study introduces Hybrid Kolmogorov-Arnold Network (KAN)-T (Hyb-KAN ViT) to address the inherent limitations of Multi-Arnold Perceptrons (MLPs) in Vision Transformers (ViTs)<n>Hyb-KAN ViT is a novel framework that integrates wavelet-based spectral decomposition and spline-optimized activation functions.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This study addresses the inherent limitations of Multi-Layer Perceptrons (MLPs) in Vision Transformers (ViTs) by introducing Hybrid Kolmogorov-Arnold Network (KAN)-ViT (Hyb-KAN ViT), a novel framework that integrates wavelet-based spectral decomposition and spline-optimized activation functions, prior work has failed to focus on the prebuilt modularity of the ViT architecture and integration of edge detection capabilities of Wavelet functions. We propose two key modules: Efficient-KAN (Eff-KAN), which replaces MLP layers with spline functions and Wavelet-KAN (Wav-KAN), leveraging orthogonal wavelet transforms for multi-resolution feature extraction. These modules are systematically integrated in ViT encoder layers and classification heads to enhance spatial-frequency modeling while mitigating computational bottlenecks. Experiments on ImageNet-1K (Image Recognition), COCO (Object Detection and Instance Segmentation), and ADE20K (Semantic Segmentation) demonstrate state-of-the-art performance with Hyb-KAN ViT. Ablation studies validate the efficacy of wavelet-driven spectral priors in segmentation and spline-based efficiency in detection tasks. The framework establishes a new paradigm for balancing parameter efficiency and multi-scale representation in vision architectures.
Related papers
- Frequency-Dynamic Attention Modulation for Dense Prediction [14.066404173580864]
We propose a circuit-theory-inspired strategy called Frequency-Dynamic Attention Modulation (FDAM)<n>FDAM directly modulates the overall frequency response of Vision Transformers (ViTs)
arXiv Detail & Related papers (2025-07-16T07:59:54Z) - FreSca: Scaling in Frequency Space Enhances Diffusion Models [55.75504192166779]
This paper explores frequency-based control within latent diffusion models.<n>We introduce FreSca, a novel framework that decomposes noise difference into low- and high-frequency components.<n>FreSca operates without any model retraining or architectural change, offering model- and task-agnostic control.
arXiv Detail & Related papers (2025-04-02T22:03:11Z) - BHViT: Binarized Hybrid Vision Transformer [53.38894971164072]
Model binarization has made significant progress in enabling real-time and energy-efficient computation for convolutional neural networks (CNN)<n>We propose BHViT, a binarization-friendly hybrid ViT architecture and its full binarization model with the guidance of three important observations.<n>Our proposed algorithm achieves SOTA performance among binary ViT methods.
arXiv Detail & Related papers (2025-03-04T08:35:01Z) - Spatial-Spectral Diffusion Contrastive Representation Network for Hyperspectral Image Classification [8.600534616819333]
This paper presents a Spatial-Spectral Diffusion Contrastive Representation Network (DiffCRN)<n>DiffCRN is based on denoising diffusion probabilistic model (DDPM) combined with contrastive learning (CL) for hyperspectral images classification.<n> Experiments conducted on widely used four HSI datasets demonstrate the improved performance of the proposed DiffCRN.
arXiv Detail & Related papers (2025-02-27T02:34:23Z) - FE-UNet: Frequency Domain Enhanced U-Net with Segment Anything Capability for Versatile Image Segmentation [50.9040167152168]
We experimentally quantify the contrast sensitivity function of CNNs and compare it with that of the human visual system.<n>We propose the Wavelet-Guided Spectral Pooling Module (WSPM) to enhance and balance image features across the frequency domain.<n>To further emulate the human visual system, we introduce the Frequency Domain Enhanced Receptive Field Block (FE-RFB)<n>We develop FE-UNet, a model that utilizes SAM2 as its backbone and incorporates Hiera-Large as a pre-trained block.
arXiv Detail & Related papers (2025-02-06T07:24:34Z) - A Hybrid Transformer-Mamba Network for Single Image Deraining [70.64069487982916]
Existing deraining Transformers employ self-attention mechanisms with fixed-range windows or along channel dimensions.
We introduce a novel dual-branch hybrid Transformer-Mamba network, denoted as TransMamba, aimed at effectively capturing long-range rain-related dependencies.
arXiv Detail & Related papers (2024-08-31T10:03:19Z) - PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection [59.355022416218624]
integration of point and voxel representations is becoming more common in LiDAR-based 3D object detection.
We propose a novel two-stage 3D object detector, called Point-Voxel Attention Fusion Network (PVAFN)
PVAFN uses a multi-pooling strategy to integrate both multi-scale and region-specific information effectively.
arXiv Detail & Related papers (2024-08-26T19:43:01Z) - Unveiling the Power of Wavelets: A Wavelet-based Kolmogorov-Arnold Network for Hyperspectral Image Classification [3.009351592961681]
This paper employs Wavelet-based Kolmogorov-Arnold Network (wav-kan) architecture tailored for efficient modeling of complex dependencies.
Wavelet-based activation allows Wav-KAN to effectively capture multi-scale spatial and spectral patterns.
arXiv Detail & Related papers (2024-06-12T04:52:40Z) - Hybrid Convolutional and Attention Network for Hyperspectral Image Denoising [54.110544509099526]
Hyperspectral image (HSI) denoising is critical for the effective analysis and interpretation of hyperspectral data.
We propose a hybrid convolution and attention network (HCANet) to enhance HSI denoising.
Experimental results on mainstream HSI datasets demonstrate the rationality and effectiveness of the proposed HCANet.
arXiv Detail & Related papers (2024-03-15T07:18:43Z) - Transformer variational wave functions for frustrated quantum spin
systems [0.0]
We propose an adaptation of the ViT architecture with complex parameters to define a new class of variational neural-network states.
The success of the ViT wave function relies on mixing both local and global operations.
arXiv Detail & Related papers (2022-11-10T11:56:44Z) - RetiFluidNet: A Self-Adaptive and Multi-Attention Deep Convolutional
Network for Retinal OCT Fluid Segmentation [3.57686754209902]
Quantification of retinal fluids is necessary for OCT-guided treatment management.
New convolutional neural architecture named RetiFluidNet is proposed for multi-class retinal fluid segmentation.
Model benefits from hierarchical representation learning of textural, contextual, and edge features.
arXiv Detail & Related papers (2022-09-26T07:18:00Z) - PnP-DETR: Towards Efficient Visual Analysis with Transformers [146.55679348493587]
Recently, DETR pioneered the solution vision tasks with transformers, it directly translates the image feature map into the object result.
Recent transformer-based image recognition model andTT show consistent efficiency gain.
arXiv Detail & Related papers (2021-09-15T01:10:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.