Few-Shot Object Detection via Spatial-Channel State Space Model
- URL: http://arxiv.org/abs/2507.15308v1
- Date: Mon, 21 Jul 2025 07:08:19 GMT
- Title: Few-Shot Object Detection via Spatial-Channel State Space Model
- Authors: Zhimeng Xin, Tianxu Wu, Yixiong Zou, Shiming Chen, Dingjie Fu, Xinge You,
- Abstract summary: Current methods may struggle to accurately extract effective features from each channel.<n>We propose a Spatial-Channel State Space Modeling (SCSM) module for spatial-channel state modeling.<n>SCSM module highlights the effective patterns and rectifies those ineffective ones in feature channels.
- Score: 9.644454618045133
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Due to the limited training samples in few-shot object detection (FSOD), we observe that current methods may struggle to accurately extract effective features from each channel. Specifically, this issue manifests in two aspects: i) channels with high weights may not necessarily be effective, and ii) channels with low weights may still hold significant value. To handle this problem, we consider utilizing the inter-channel correlation to facilitate the novel model's adaptation process to novel conditions, ensuring the model can correctly highlight effective channels and rectify those incorrect ones. Since the channel sequence is also 1-dimensional, its similarity with the temporal sequence inspires us to take Mamba for modeling the correlation in the channel sequence. Based on this concept, we propose a Spatial-Channel State Space Modeling (SCSM) module for spatial-channel state modeling, which highlights the effective patterns and rectifies those ineffective ones in feature channels. In SCSM, we design the Spatial Feature Modeling (SFM) module to balance the learning of spatial relationships and channel relationships, and then introduce the Channel State Modeling (CSM) module based on Mamba to learn correlation in channels. Extensive experiments on the VOC and COCO datasets show that the SCSM module enables the novel detector to improve the quality of focused feature representation in channels and achieve state-of-the-art performance.
Related papers
- Controllable diffusion-based generation for multi-channel biological data [66.44042377817074]
This work proposes a unified diffusion framework for controllable generation over structured and spatial biological data.<n>We show state-of-the-art performance across both spatial and non-spatial prediction tasks, including protein imputation in IMC and gene-to-protein prediction in single-cell datasets.
arXiv Detail & Related papers (2025-06-24T00:56:21Z) - A CGAN-LSTM-Based Framework for Time-Varying Non-Stationary Channel Modeling [18.899432402460565]
This paper emphasizes the generation of long-term dynamic channel to capture evolution of non-stationary channel properties.<n>We propose a hybrid deep learning framework that combines conditional generative adversarial networks (CGAN) with long short-term memory (LSTM) networks.<n>A stationarity-constrained approach is designed to ensure temporal correlation of the generated time-series channel.
arXiv Detail & Related papers (2025-03-03T03:27:45Z) - CFFormer: Cross CNN-Transformer Channel Attention and Spatial Feature Fusion for Improved Segmentation of Heterogeneous Medical Images [29.68616115427831]
Medical image segmentation plays an important role in computer-aided diagnosis.<n>Due to limitations of medical imaging devices, medical images exhibit significant heterogeneity, posing challenges for segmentation.<n>We propose a hybrid CNN-Transformer model,called CFFormer, which leverages effective channel feature extraction.
arXiv Detail & Related papers (2025-01-07T08:59:20Z) - Channel-Aware Low-Rank Adaptation in Time Series Forecasting [43.684035409535696]
Two representative channel strategies are closely associated with model expressivity and robustness.
We present a channel-aware low-rank adaptation method to condition CD models on identity-aware individual components.
arXiv Detail & Related papers (2024-07-24T13:05:17Z) - DA-Flow: Dual Attention Normalizing Flow for Skeleton-based Video Anomaly Detection [52.74152717667157]
We propose a lightweight module called Dual Attention Module (DAM) for capturing cross-dimension interaction relationships in-temporal skeletal data.
It employs the frame attention mechanism to identify the most significant frames and the skeleton attention mechanism to capture broader relationships across fixed partitions with minimal parameters and flops.
arXiv Detail & Related papers (2024-06-05T06:18:03Z) - CU-Mamba: Selective State Space Models with Channel Learning for Image Restoration [7.292363114816646]
We introduce the Channel-Aware U-Shaped Mamba model, which incorporates a dual State Space Model framework into the U-Net architecture.
Experiments validate CU-Mamba's superiority over existing state-of-the-art methods.
arXiv Detail & Related papers (2024-04-17T22:02:22Z) - From Similarity to Superiority: Channel Clustering for Time Series Forecasting [61.96777031937871]
We develop a novel and adaptable Channel Clustering Module ( CCM)
CCM dynamically groups channels characterized by intrinsic similarities and leverages cluster information instead of individual channel identities.
CCM can boost the performance of CI and CD models by an average margin of 2.4% and 7.2% on long-term and short-term forecasting, respectively.
arXiv Detail & Related papers (2024-03-31T02:46:27Z) - Joint Channel Estimation and Feedback with Masked Token Transformers in
Massive MIMO Systems [74.52117784544758]
This paper proposes an encoder-decoder based network that unveils the intrinsic frequency-domain correlation within the CSI matrix.
The entire encoder-decoder network is utilized for channel compression.
Our method outperforms state-of-the-art channel estimation and feedback techniques in joint tasks.
arXiv Detail & Related papers (2023-06-08T06:15:17Z) - MIMO-GAN: Generative MIMO Channel Modeling [13.277946558463201]
We propose generative channel modeling to learn statistical channel models from channel input-output measurements.
We leverage advances in GAN, which helps us learn an implicit distribution over channels from observed measurements.
arXiv Detail & Related papers (2022-03-16T12:36:38Z) - Dual Attention GANs for Semantic Image Synthesis [101.36015877815537]
We propose a novel Dual Attention GAN (DAGAN) to synthesize photo-realistic and semantically-consistent images.
We also propose two novel modules, i.e., position-wise Spatial Attention Module (SAM) and scale-wise Channel Attention Module (CAM)
DAGAN achieves remarkably better results than state-of-the-art methods, while using fewer model parameters.
arXiv Detail & Related papers (2020-08-29T17:49:01Z) - Data-Driven Symbol Detection via Model-Based Machine Learning [117.58188185409904]
We review a data-driven framework to symbol detection design which combines machine learning (ML) and model-based algorithms.
In this hybrid approach, well-known channel-model-based algorithms are augmented with ML-based algorithms to remove their channel-model-dependence.
Our results demonstrate that these techniques can yield near-optimal performance of model-based algorithms without knowing the exact channel input-output statistical relationship.
arXiv Detail & Related papers (2020-02-14T06:58:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.