Cross-Scan Mamba with Masked Training for Robust Spectral Imaging
- URL: http://arxiv.org/abs/2408.00629v2
- Date: Fri, 06 Dec 2024 23:30:10 GMT
- Title: Cross-Scan Mamba with Masked Training for Robust Spectral Imaging
- Authors: Wenzhe Tian, Haijin Zeng, Yin-Ping Zhao, Yongyong Chen, Zhen Wang, Xuelong Li,
- Abstract summary: We propose the Cross-Scanning Mamba, named CS-Mamba, that employs a Spatial-Spectral SSM for global-local balanced context encoding.
Experiment results show that our CS-Mamba achieves state-of-the-art performance and the masked training method can better reconstruct smooth features to improve the visual quality.
- Score: 51.557804095896174
- License:
- Abstract: Snapshot Compressive Imaging (SCI) enables fast spectral imaging but requires effective decoding algorithms for hyperspectral image (HSI) reconstruction from compressed measurements. Current CNN-based methods are limited in modeling long-range dependencies, while Transformer-based models face high computational complexity. Although recent Mamba models outperform CNNs and Transformers in RGB tasks concerning computational efficiency or accuracy, they are not specifically optimized to fully leverage the local spatial and spectral correlations inherent in HSIs. To address this, we propose the Cross-Scanning Mamba, named CS-Mamba, that employs a Spatial-Spectral SSM for global-local balanced context encoding and cross-channel interaction promotion. Besides, while current reconstruction algorithms perform increasingly well in simulation scenarios, they exhibit suboptimal performance on real data due to limited generalization capability. During the training process, the model may not capture the inherent features of the images but rather learn the parameters to mitigate specific noise and loss, which may lead to a decline in reconstruction quality when faced with real scenes. To overcome this challenge, we propose a masked training method to enhance the generalization ability of models. Experiment results show that our CS-Mamba achieves state-of-the-art performance and the masked training method can better reconstruct smooth features to improve the visual quality.
Related papers
- Detail Matters: Mamba-Inspired Joint Unfolding Network for Snapshot Spectral Compressive Imaging [40.80197280147993]
We propose a Mamba-inspired Joint Unfolding Network (MiJUN) to overcome the inherent nonlinear and ill-posed characteristics of HSI reconstruction.
We introduce an accelerated unfolding network scheme, which reduces the reliance on initial optimization stages.
We refine the scanning strategy with Mamba by integrating the tensor mode-$k$ unfolding into the Mamba network.
arXiv Detail & Related papers (2025-01-02T13:56:23Z) - MAL: Cluster-Masked and Multi-Task Pretraining for Enhanced xLSTM Vision Performance [2.45239928345171]
We introduce MAL (Cluster-Masked and Multi-Task Pretraining for Enhanced xLSTM Vision Performance), a novel framework that enhances xLSTM's capabilities through innovative pretraining strategies.
We propose a cluster-masked masking method that significantly improves local feature capture and optimize image scanning efficiency.
Our universal encoder-decoder pretraining approach integrates multiple tasks, including image autoregression, depth estimation, and image segmentation, thereby enhancing the model's adaptability and robustness across diverse visual tasks.
arXiv Detail & Related papers (2024-12-14T07:58:24Z) - Scalable Visual State Space Model with Fractal Scanning [16.077348474371547]
State Space Models (SSMs) have emerged as efficient alternatives to Transformer models.
We propose using fractal scanning curves for patch serialization.
We validate our method in image classification, detection, and segmentation tasks.
arXiv Detail & Related papers (2024-05-23T12:12:11Z) - Efficient Visual State Space Model for Image Deblurring [83.57239834238035]
Convolutional neural networks (CNNs) and Vision Transformers (ViTs) have achieved excellent performance in image restoration.
We propose a simple yet effective visual state space model (EVSSM) for image deblurring.
arXiv Detail & Related papers (2024-05-23T09:13:36Z) - Mamba-in-Mamba: Centralized Mamba-Cross-Scan in Tokenized Mamba Model for Hyperspectral Image Classification [4.389334324926174]
This study introduces the innovative Mamba-in-Mamba (MiM) architecture for HSI classification, the first attempt of deploying State Space Model (SSM) in this task.
MiM model includes 1) A novel centralized Mamba-Cross-Scan (MCS) mechanism for transforming images into sequence-data, 2) A Tokenized Mamba (T-Mamba) encoder, and 3) A Weighted MCS Fusion (WMF) module.
Experimental results from three public HSI datasets demonstrate that our method outperforms existing baselines and state-of-the-art approaches.
arXiv Detail & Related papers (2024-05-20T13:19:02Z) - Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components.
CNNs are used to augment the local texture information of coarse priors.
DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z) - Exploring Effective Mask Sampling Modeling for Neural Image Compression [171.35596121939238]
Most existing neural image compression methods rely on side information from hyperprior or context models to eliminate spatial redundancy.
Inspired by the mask sampling modeling in recent self-supervised learning methods for natural language processing and high-level vision, we propose a novel pretraining strategy for neural image compression.
Our method achieves competitive performance with lower computational complexity compared to state-of-the-art image compression methods.
arXiv Detail & Related papers (2023-06-09T06:50:20Z) - Calibrated Hyperspectral Image Reconstruction via Graph-based
Self-Tuning Network [40.71031760929464]
Hyperspectral imaging (HSI) has attracted increasing research attention, especially for the ones based on a coded snapshot spectral imaging (CASSI) system.
Existing deep HSI reconstruction models are generally trained on paired data to retrieve original signals upon 2D compressed measurements given by a particular optical hardware mask in CASSI.
This mask-specific training style will lead to a hardware miscalibration issue, which sets up barriers to deploying deep HSI models among different hardware and noisy environments.
We propose a novel Graph-based Self-Tuning ( GST) network to reason uncertainties adapting to varying spatial structures of masks among
arXiv Detail & Related papers (2021-12-31T09:39:13Z) - Adaptive Gradient Balancing for UndersampledMRI Reconstruction and
Image-to-Image Translation [60.663499381212425]
We enhance the image quality by using a Wasserstein Generative Adversarial Network combined with a novel Adaptive Gradient Balancing technique.
In MRI, our method minimizes artifacts, while maintaining a high-quality reconstruction that produces sharper images than other techniques.
arXiv Detail & Related papers (2021-04-05T13:05:22Z) - Learning Frequency-aware Dynamic Network for Efficient Super-Resolution [56.98668484450857]
This paper explores a novel frequency-aware dynamic network for dividing the input into multiple parts according to its coefficients in the discrete cosine transform (DCT) domain.
In practice, the high-frequency part will be processed using expensive operations and the lower-frequency part is assigned with cheap operations to relieve the computation burden.
Experiments conducted on benchmark SISR models and datasets show that the frequency-aware dynamic network can be employed for various SISR neural architectures.
arXiv Detail & Related papers (2021-03-15T12:54:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.