Spectral-Adaptive Modulation Networks for Visual Perception
- URL: http://arxiv.org/abs/2503.23947v1
- Date: Mon, 31 Mar 2025 10:53:42 GMT
- Title: Spectral-Adaptive Modulation Networks for Visual Perception
- Authors: Guhnoo Yun, Juhan Yoo, Kijung Kim, Jeongho Lee, Paul Hongsuck Seo, Dong Hwan Kim,
- Abstract summary: We use graph spectral analysis to theoretically simulate and compare the frequency responses of 2D convolution and self-attention.<n>Our results corroborate previous empirical findings and reveal that node connectivity, modulated by window size, is a key factor in shaping spectral functions.<n>Based on SPAM, we develop SPANetV2 as a novel vision backbone.
- Score: 9.912286808419205
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent studies have shown that 2D convolution and self-attention exhibit distinct spectral behaviors, and optimizing their spectral properties can enhance vision model performance. However, theoretical analyses remain limited in explaining why 2D convolution is more effective in high-pass filtering than self-attention and why larger kernels favor shape bias, akin to self-attention. In this paper, we employ graph spectral analysis to theoretically simulate and compare the frequency responses of 2D convolution and self-attention within a unified framework. Our results corroborate previous empirical findings and reveal that node connectivity, modulated by window size, is a key factor in shaping spectral functions. Leveraging this insight, we introduce a \textit{spectral-adaptive modulation} (SPAM) mixer, which processes visual features in a spectral-adaptive manner using multi-scale convolutional kernels and a spectral re-scaling mechanism to refine spectral components. Based on SPAM, we develop SPANetV2 as a novel vision backbone. Extensive experiments demonstrate that SPANetV2 outperforms state-of-the-art models across multiple vision tasks, including ImageNet-1K classification, COCO object detection, and ADE20K semantic segmentation.
Related papers
- SpectralX: Parameter-efficient Domain Generalization for Spectral Remote Sensing Foundation Models [27.717648142854696]
SpectralX is an innovative parameter-efficient fine-tuning framework that adapt existing RSFMs as backbone.<n>We develop an Attribute-oriented Mixture of Adapter (AoMoA) that dynamically aggregates multi-attribute expert knowledge.<n>By iteratively querying low-level semantic features with high-level representations, the model learns to focus on task-beneficial attributes.
arXiv Detail & Related papers (2025-08-03T12:14:38Z) - DiffSpectra: Molecular Structure Elucidation from Spectra using Diffusion Models [66.41802970528133]
Molecular structure elucidation from spectra is a foundational problem in chemistry.<n>Traditional methods rely heavily on expert interpretation and lack scalability.<n>We present DiffSpectra, a generative framework that directly infers both 2D and 3D molecular structures from multi-modal spectral data.
arXiv Detail & Related papers (2025-07-09T13:57:20Z) - CARL: Camera-Agnostic Representation Learning for Spectral Image Analysis [75.25966323298003]
Spectral imaging offers promising applications across diverse domains, including medicine and urban scene understanding.
variability in channel dimensionality and captured wavelengths among spectral cameras impede the development of AI-driven methodologies.
We introduce $textbfCARL$, a model for $textbfC$amera-$textbfA$gnostic $textbfR$esupervised $textbfL$ across RGB, multispectral, and hyperspectral imaging modalities.
arXiv Detail & Related papers (2025-04-27T13:06:40Z) - FE-UNet: Frequency Domain Enhanced U-Net with Segment Anything Capability for Versatile Image Segmentation [50.9040167152168]
We experimentally quantify the contrast sensitivity function of CNNs and compare it with that of the human visual system.<n>We propose the Wavelet-Guided Spectral Pooling Module (WSPM) to enhance and balance image features across the frequency domain.<n>To further emulate the human visual system, we introduce the Frequency Domain Enhanced Receptive Field Block (FE-RFB)<n>We develop FE-UNet, a model that utilizes SAM2 as its backbone and incorporates Hiera-Large as a pre-trained block.
arXiv Detail & Related papers (2025-02-06T07:24:34Z) - Approaching Deep Learning through the Spectral Dynamics of Weights [41.948042468042374]
spectral dynamics of weights -- the behavior of singular values and vectors during optimization -- to clarify and unify several phenomena in deep learning.
We identify a consistent bias in optimization across various experiments, from small-scale grokking'' to large-scale tasks like image classification with ConvNets, image generation with UNets, speech recognition with LSTMs, and language modeling with Transformers.
arXiv Detail & Related papers (2024-08-21T17:48:01Z) - GrassNet: State Space Model Meets Graph Neural Network [57.62885438406724]
Graph State Space Network (GrassNet) is a novel graph neural network with theoretical support that provides a simple yet effective scheme for designing arbitrary graph spectral filters.
To the best of our knowledge, our work is the first to employ SSMs for the design of graph GNN spectral filters.
Extensive experiments on nine public benchmarks reveal that GrassNet achieves superior performance in real-world graph modeling tasks.
arXiv Detail & Related papers (2024-08-16T07:33:58Z) - Spectral Graph Reasoning Network for Hyperspectral Image Classification [0.43512163406551996]
Convolutional neural networks (CNNs) have achieved remarkable performance in hyperspectral image (HSI) classification.
We propose a spectral graph reasoning network (SGR) learning framework comprising two crucial modules.
Experiments on two HSI datasets demonstrate that the proposed architecture can significantly improve the classification accuracy.
arXiv Detail & Related papers (2024-07-02T20:29:23Z) - Coarse-Fine Spectral-Aware Deformable Convolution For Hyperspectral Image Reconstruction [15.537910100051866]
We study the inverse problem of Coded Aperture Snapshot Spectral Imaging (CASSI)
We propose Coarse-Fine Spectral-Aware Deformable Convolution Network (CFSDCN)
Our CFSDCN significantly outperforms previous state-of-the-art (SOTA) methods on both simulated and real HSI datasets.
arXiv Detail & Related papers (2024-06-18T15:15:12Z) - Hybrid Convolutional and Attention Network for Hyperspectral Image Denoising [54.110544509099526]
Hyperspectral image (HSI) denoising is critical for the effective analysis and interpretation of hyperspectral data.
We propose a hybrid convolution and attention network (HCANet) to enhance HSI denoising.
Experimental results on mainstream HSI datasets demonstrate the rationality and effectiveness of the proposed HCANet.
arXiv Detail & Related papers (2024-03-15T07:18:43Z) - SpectralGPT: Spectral Remote Sensing Foundation Model [60.023956954916414]
A universal RS foundation model, named SpectralGPT, is purpose-built to handle spectral RS images using a novel 3D generative pretrained transformer (GPT)
Compared to existing foundation models, SpectralGPT accommodates input images with varying sizes, resolutions, time series, and regions in a progressive training fashion, enabling full utilization of extensive RS big data.
Our evaluation highlights significant performance improvements with pretrained SpectralGPT models, signifying substantial potential in advancing spectral RS big data applications within the field of geoscience.
arXiv Detail & Related papers (2023-11-13T07:09:30Z) - ESSAformer: Efficient Transformer for Hyperspectral Image
Super-resolution [76.7408734079706]
Single hyperspectral image super-resolution (single-HSI-SR) aims to restore a high-resolution hyperspectral image from a low-resolution observation.
We propose ESSAformer, an ESSA attention-embedded Transformer network for single-HSI-SR with an iterative refining structure.
arXiv Detail & Related papers (2023-07-26T07:45:14Z) - Unsupervised Spectral Demosaicing with Lightweight Spectral Attention
Networks [6.7433262627741914]
This paper presents a deep learning-based spectral demosaicing technique trained in an unsupervised manner.
The proposed method outperforms conventional unsupervised methods in terms of spatial distortion suppression, spectral fidelity, robustness, and computational cost.
arXiv Detail & Related papers (2023-07-05T02:45:44Z) - Boosting the Generalization Ability for Hyperspectral Image Classification using Spectral-spatial Axial Aggregation Transformer [14.594398447576188]
In the hyperspectral image classification (HSIC) task, the most commonly used model validation paradigm is partitioning the training-test dataset through pixel-wise random sampling.
In our experiments, we found that the high accuracy was reached because the training and test datasets share a lot of information.
We propose a spectral-spatial axial aggregation transformer model, namely SaaFormer, that preserves generalization across dataset partitions.
arXiv Detail & Related papers (2023-06-29T07:55:43Z) - Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image
Reconstruction [127.20208645280438]
Hyperspectral image (HSI) reconstruction aims to recover the 3D spatial-spectral signal from a 2D measurement.
Modeling the inter-spectra interactions is beneficial for HSI reconstruction.
Mask-guided Spectral-wise Transformer (MST) proposes a novel framework for HSI reconstruction.
arXiv Detail & Related papers (2021-11-15T16:59:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.