MCGA: Mixture of Codebooks Hyperspectral Reconstruction via Grayscale-Aware Attention
- URL: http://arxiv.org/abs/2507.09885v1
- Date: Mon, 14 Jul 2025 03:46:06 GMT
- Title: MCGA: Mixture of Codebooks Hyperspectral Reconstruction via Grayscale-Aware Attention
- Authors: Zhanjiang Yang, Lijun Sun, Jiawei Dong, Xiaoxin An, Yang Liu, Meng Li,
- Abstract summary: We propose a two-stage approach, MCGA, which first learns spectral patterns before estimating the mapping.<n>In the first stage, a multi-scale VQ-VAE learns representations from heterogeneous HSI datasets, extracting a Mixture of Codebooks (MoC)<n>In the second stage, the RGB-to-HSI mapping is refined by querying features from the MoC to replace latent HSI representations.
- Score: 19.156831096843284
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reconstructing hyperspectral images (HSI) from RGB images is a cost-effective solution for various vision-based applications. However, most existing learning-based hyperspectral reconstruction methods directly learn the RGB-to-HSI mapping using complex attention mechanisms, neglecting the inherent challenge of transitioning from low-dimensional to high-dimensional information. To address this limitation, we propose a two-stage approach, MCGA, which first learns spectral patterns before estimating the mapping. In the first stage, a multi-scale VQ-VAE learns representations from heterogeneous HSI datasets, extracting a Mixture of Codebooks (MoC). In the second stage, the RGB-to-HSI mapping is refined by querying features from the MoC to replace latent HSI representations, incorporating prior knowledge rather than forcing a direct high-dimensional transformation. To further enhance reconstruction quality, we introduce Grayscale-Aware Attention and Quantized Self-Attention, which adaptively adjust feature map intensities to meet hyperspectral reconstruction requirements. This physically motivated attention mechanism ensures lightweight and efficient HSI recovery. Moreover, we propose an entropy-based Test-Time Adaptation strategy to improve robustness in real-world scenarios. Extensive experiments demonstrate that our method, MCGA, achieves state-of-the-art performance. The code and models will be released at https://github.com/Fibonaccirabbit/MCGA
Related papers
- Leveraging Multispectral Sensors for Color Correction in Mobile Cameras [22.93423876118074]
Recent advances in snapshot multispectral (MS) imaging have enabled compact, low-cost spectral sensors for consumer and mobile devices.<n>We propose a unified, learning-based framework that performs end-to-end color correction.<n>We show that our approach improves color accuracy and stability, reducing error by up to 50% compared to RGB-only and MS-driven baselines.
arXiv Detail & Related papers (2025-12-09T10:14:13Z) - HyPSAM: Hybrid Prompt-driven Segment Anything Model for RGB-Thermal Salient Object Detection [75.406055413928]
We propose a novel prompt-driven segment anything model (HyPSAM) for RGB-T SOD.<n> DFNet employs dynamic convolution and multi-branch decoding to facilitate adaptive cross-modality interaction.<n>Experiments on three public datasets demonstrate that our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2025-09-23T07:32:11Z) - Compressive Imaging Reconstruction via Tensor Decomposed Multi-Resolution Grid Encoding [50.54887630778593]
Compressive imaging (CI) reconstruction aims to recover high-dimensional images from low-dimensional measurements compressed.<n>Existing unsupervised representations may struggle to achieve a desired balance between representation ability and efficiency.<n>We propose Decomposed multi-resolution Grid encoding (GridTD), an unsupervised continuous representation framework for CI reconstruction.
arXiv Detail & Related papers (2025-07-10T12:36:20Z) - Physical Degradation Model-Guided Interferometric Hyperspectral Reconstruction with Unfolding Transformer [10.761506243784744]
Interferometric Hyperspectral Imaging (IHI) is a critical technique for large-scale remote sensing tasks.<n>IHI is susceptible to complex errors arising from imaging steps, and its quality is limited by existing signal processing-based reconstruction algorithms.<n>We propose a novel IHI reconstruction pipeline to address two key challenges: the lack of training datasets and the difficulty in eliminating IHI-specific degradation components.
arXiv Detail & Related papers (2025-06-27T03:36:00Z) - Mixed-granularity Implicit Representation for Continuous Hyperspectral Compressive Reconstruction [16.975538181162616]
This study introduces a novel method using implicit neural representation for continuous hyperspectral image reconstruction.<n>By leveraging implicit neural representations, the MGIR framework enables reconstruction at any desired spatial-spectral resolution.
arXiv Detail & Related papers (2025-03-17T03:37:42Z) - Unleashing Correlation and Continuity for Hyperspectral Reconstruction from RGB Images [64.80875911446937]
We propose a Correlation and Continuity Network (CCNet) for HSI reconstruction from RGB images.<n>For the correlation of local spectrum, we introduce the Group-wise Spectral Correlation Modeling (GrSCM) module.<n>For the continuity of global spectrum, we design the Neighborhood-wise Spectral Continuity Modeling (NeSCM) module.
arXiv Detail & Related papers (2025-01-02T15:14:40Z) - Super-Resolution for Remote Sensing Imagery via the Coupling of a Variational Model and Deep Learning [20.697932997351813]
gradient-guided multi-frame super-resolution (MFSR) framework for remote sensing imagery reconstruction.<n>We propose a novel gradient-guided multi-frame super-resolution (MFSR) framework for remote sensing imagery reconstruction.
arXiv Detail & Related papers (2024-12-13T04:19:48Z) - ESSAformer: Efficient Transformer for Hyperspectral Image
Super-resolution [76.7408734079706]
Single hyperspectral image super-resolution (single-HSI-SR) aims to restore a high-resolution hyperspectral image from a low-resolution observation.
We propose ESSAformer, an ESSA attention-embedded Transformer network for single-HSI-SR with an iterative refining structure.
arXiv Detail & Related papers (2023-07-26T07:45:14Z) - Residual Spatial Fusion Network for RGB-Thermal Semantic Segmentation [19.41334573257174]
Traditional methods mostly use RGB images which are heavily affected by lighting conditions, eg, darkness.
Recent studies show thermal images are robust to the night scenario as a compensating modality for segmentation.
This work proposes a Residual Spatial Fusion Network (RSFNet) for RGB-T semantic segmentation.
arXiv Detail & Related papers (2023-06-17T14:28:08Z) - Hyperspectral Image Super Resolution with Real Unaligned RGB Guidance [11.711656319221072]
We propose an HSI fusion network with heterogenous feature extractions, multi-stage feature alignments, and attentive feature fusion.
Our method obtains a clear improvement over existing single-image and fusion-based super-resolution methods on quantitative assessment as well as visual comparison.
arXiv Detail & Related papers (2023-02-13T11:56:45Z) - CIR-Net: Cross-modality Interaction and Refinement for RGB-D Salient
Object Detection [144.66411561224507]
We present a convolutional neural network (CNN) model, named CIR-Net, based on the novel cross-modality interaction and refinement.
Our network outperforms the state-of-the-art saliency detectors both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-10-06T11:59:19Z) - Deep Coding Patterns Design for Compressive Near-Infrared Spectral
Classification [80.93625278357229]
spectral classification can be performed directly in the compressive domain, considering the amount of spectral information embedded in the measurements.
This work proposes an end-to-end approach to jointly design the coding patterns used in CSI and the network parameters to perform spectral classification directly from the embedded near-infrared compressive measurements.
arXiv Detail & Related papers (2022-05-27T15:55:53Z) - Degradation-Aware Unfolding Half-Shuffle Transformer for Spectral
Compressive Imaging [142.11622043078867]
We propose a principled Degradation-Aware Unfolding Framework (DAUF) that estimates parameters from the compressed image and physical mask, and then uses these parameters to control each iteration.
By plugging HST into DAUF, we establish the first Transformer-based deep unfolding method, Degradation-Aware Unfolding Half-Shuffle Transformer (DAUHST) for HSI reconstruction.
arXiv Detail & Related papers (2022-05-20T11:37:44Z) - Coarse-to-Fine Sparse Transformer for Hyperspectral Image Reconstruction [138.04956118993934]
We propose a novel Transformer-based method, coarse-to-fine sparse Transformer (CST)
CST embedding HSI sparsity into deep learning for HSI reconstruction.
In particular, CST uses our proposed spectra-aware screening mechanism (SASM) for coarse patch selecting. Then the selected patches are fed into our customized spectra-aggregation hashing multi-head self-attention (SAH-MSA) for fine pixel clustering and self-similarity capturing.
arXiv Detail & Related papers (2022-03-09T16:17:47Z) - HDNet: High-resolution Dual-domain Learning for Spectral Compressive
Imaging [138.04956118993934]
We propose a high-resolution dual-domain learning network (HDNet) for HSI reconstruction.
On the one hand, the proposed HR spatial-spectral attention module with its efficient feature fusion provides continuous and fine pixel-level features.
On the other hand, frequency domain learning (FDL) is introduced for HSI reconstruction to narrow the frequency domain discrepancy.
arXiv Detail & Related papers (2022-03-04T06:37:45Z) - Spectral Compressive Imaging Reconstruction Using Convolution and
Contextual Transformer [6.929652454131988]
We propose a hybrid network module, namely CCoT (Contextual Transformer) block, which can acquire the inductive bias ability of transformer simultaneously.
We integrate the proposed CCoT block into deep unfolding framework based on the generalized alternating projection algorithm, and further propose the GAP-CT network.
arXiv Detail & Related papers (2022-01-15T06:30:03Z) - Calibrated Hyperspectral Image Reconstruction via Graph-based
Self-Tuning Network [40.71031760929464]
Hyperspectral imaging (HSI) has attracted increasing research attention, especially for the ones based on a coded snapshot spectral imaging (CASSI) system.
Existing deep HSI reconstruction models are generally trained on paired data to retrieve original signals upon 2D compressed measurements given by a particular optical hardware mask in CASSI.
This mask-specific training style will lead to a hardware miscalibration issue, which sets up barriers to deploying deep HSI models among different hardware and noisy environments.
We propose a novel Graph-based Self-Tuning ( GST) network to reason uncertainties adapting to varying spatial structures of masks among
arXiv Detail & Related papers (2021-12-31T09:39:13Z) - Semantic-embedded Unsupervised Spectral Reconstruction from Single RGB
Images in the Wild [48.44194221801609]
We propose a new lightweight and end-to-end learning-based framework to tackle this challenge.
We progressively spread the differences between input RGB images and re-projected RGB images from recovered HS images via effective camera spectral response function estimation.
Our method significantly outperforms state-of-the-art unsupervised methods and even exceeds the latest supervised method under some settings.
arXiv Detail & Related papers (2021-08-15T05:19:44Z) - Deep-learning-based Hyperspectral imaging through a RGB camera [6.931572045689959]
Hyperspectral image (HSI) contains both spatial pattern and spectral information which has been widely used in food safety, remote sensing, and medical detection.
Recently, it has been reported that HSI can be reconstructed from single RGB image using convolution neural network (CNN) algorithms.
In this study, we focused on the influence of the RGB camera spectral sensitivity (CSS) on the HSI.
arXiv Detail & Related papers (2021-07-12T04:23:25Z) - Cascade Graph Neural Networks for RGB-D Salient Object Detection [41.57218490671026]
We study the problem of salient object detection (SOD) for RGB-D images using both color and depth information.
We introduce Cascade Graph Neural Networks(Cas-Gnn),a unified framework which is capable of comprehensively distilling and reasoning the mutual benefits between these two data sources.
Cas-Gnn achieves significantly better performance than all existing RGB-DSOD approaches on several widely-used benchmarks.
arXiv Detail & Related papers (2020-08-07T10:59:04Z) - Bi-directional Cross-Modality Feature Propagation with
Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation.
Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion.
In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z) - Cross-Attention in Coupled Unmixing Nets for Unsupervised Hyperspectral
Super-Resolution [79.97180849505294]
We propose a novel coupled unmixing network with a cross-attention mechanism, CUCaNet, to enhance the spatial resolution of HSI.
Experiments are conducted on three widely-used HS-MS datasets in comparison with state-of-the-art HSI-SR models.
arXiv Detail & Related papers (2020-07-10T08:08:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.