SWAN: Self-supervised Wavelet Neural Network for Hyperspectral Image Unmixing
- URL: http://arxiv.org/abs/2510.22607v1
- Date: Sun, 26 Oct 2025 10:05:48 GMT
- Title: SWAN: Self-supervised Wavelet Neural Network for Hyperspectral Image Unmixing
- Authors: Yassh Ramchandani, Vijayashekhar S S, Jignesh S. Bhatt,
- Abstract summary: We present SWAN: a three-stage, self-supervised wavelet neural network for estimation of endmembers and abundances from hyperspectral imagery.<n>The idea is to exploit latent symmetries from thus obtained invariant and covariant features using a self-supervised learning paradigm.<n> Experiments are conducted on two benchmark synthetic data sets with different signal-to-noise ratios as well as on three real benchmark hyperspectral data sets.
- Score: 0.2624902795082451
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this article, we present SWAN: a three-stage, self-supervised wavelet neural network for joint estimation of endmembers and abundances from hyperspectral imagery. The contiguous and overlapping hyperspectral band images are first expanded to Biorthogonal wavelet basis space that provides sparse, distributed, and multi-scale representations. The idea is to exploit latent symmetries from thus obtained invariant and covariant features using a self-supervised learning paradigm. The first stage, SWANencoder maps the input wavelet coefficients to a compact lower-dimensional latent space. The second stage, SWANdecoder uses the derived latent representation to reconstruct the input wavelet coefficients. Interestingly, the third stage SWANforward learns the underlying physics of the hyperspectral image. A three-stage combined loss function is formulated in the image acquisition domain that eliminates the need for ground truth and enables self-supervised training. Adam is employed for optimizing the proposed loss function, while Sigmoid with a dropout of 0.3 is incorporated to avoid possible overfitting. Kernel regularizers bound the magnitudes and preserve spatial variations in the estimated endmember coefficients. The output of SWANencoder represents estimated abundance maps during inference, while weights of SWANdecoder are retrieved to extract endmembers. Experiments are conducted on two benchmark synthetic data sets with different signal-to-noise ratios as well as on three real benchmark hyperspectral data sets while comparing the results with several state-of-the-art neural network-based unmixing methods. The qualitative, quantitative, and ablation results show performance enhancement by learning a resilient unmixing function as well as promoting self-supervision and compact network parameters for practical applications.
Related papers
- SH-SAS: An Implicit Neural Representation for Complex Spherical-Harmonic Scattering Fields for 3D Synthetic Aperture Sonar [10.13553727839228]
We introduce SH-SAS, an implicit neural representation that expresses complex acoustic scattering field as a set of spherical harmonic coefficients.<n>Results show that SH-SAS performs better in terms of 3D reconstruction quality and geometric metrics than previous methods.
arXiv Detail & Related papers (2025-09-14T04:29:28Z) - FUSE: Label-Free Image-Event Joint Monocular Depth Estimation via Frequency-Decoupled Alignment and Degradation-Robust Fusion [63.87313550399871]
Image-event joint depth estimation methods leverage complementary modalities for robust perception, yet face challenges in generalizability.<n>We propose Self-supervised Transfer (PST) and FrequencyDe-coupled Fusion module (FreDF)<n>PST establishes cross-modal knowledge transfer through latent space alignment with image foundation models.<n>FreDF explicitly decouples high-frequency edge features from low-frequency structural components, resolving modality-specific frequency mismatches.
arXiv Detail & Related papers (2025-03-25T15:04:53Z) - Divide-and-Conquer: Confluent Triple-Flow Network for RGB-T Salient Object Detection [70.84835546732738]
RGB-Thermal Salient Object Detection aims to pinpoint prominent objects within aligned pairs of visible and thermal infrared images.<n>Traditional encoder-decoder architectures may not have adequately considered the robustness against noise originating from defective modalities.<n>We propose the ConTriNet, a robust Confluent Triple-Flow Network employing a Divide-and-Conquer strategy.
arXiv Detail & Related papers (2024-12-02T14:44:39Z) - Implicit Neural Representations with Fourier Kolmogorov-Arnold Networks [4.499833362998488]
Implicit neural representations (INRs) use neural networks to provide continuous and resolution-independent representations of complex signals.<n>The proposed FKAN utilizes learnable activation functions modeled as Fourier series in the first layer to effectively control and learn the task-specific frequency components.<n> Experimental results show that our proposed FKAN model outperforms three state-of-the-art baseline schemes.
arXiv Detail & Related papers (2024-09-14T05:53:33Z) - WiNet: Wavelet-based Incremental Learning for Efficient Medical Image Registration [68.25711405944239]
Deep image registration has demonstrated exceptional accuracy and fast inference.
Recent advances have adopted either multiple cascades or pyramid architectures to estimate dense deformation fields in a coarse-to-fine manner.
We introduce a model-driven WiNet that incrementally estimates scale-wise wavelet coefficients for the displacement/velocity field across various scales.
arXiv Detail & Related papers (2024-07-18T11:51:01Z) - NeRF-Det++: Incorporating Semantic Cues and Perspective-aware Depth
Supervision for Indoor Multi-View 3D Detection [72.0098999512727]
NeRF-Det has achieved impressive performance in indoor multi-view 3D detection by utilizing NeRF to enhance representation learning.
We present three corresponding solutions, including semantic enhancement, perspective-aware sampling, and ordinal depth supervision.
The resulting algorithm, NeRF-Det++, has exhibited appealing performance in the ScanNetV2 and AR KITScenes datasets.
arXiv Detail & Related papers (2024-02-22T11:48:06Z) - Fully Differentiable Correlation-driven 2D/3D Registration for X-ray to CT Image Fusion [3.868072865207522]
Image-based rigid 2D/3D registration is a critical technique for fluoroscopic guided surgical interventions.
We propose a novel fully differentiable correlation-driven network using a dual-branch CNN-transformer encoder.
A correlation-driven loss is proposed for low-frequency feature and high-frequency feature decomposition based on embedded information.
arXiv Detail & Related papers (2024-02-04T14:12:51Z) - NAF: Neural Attenuation Fields for Sparse-View CBCT Reconstruction [79.13750275141139]
This paper proposes a novel and fast self-supervised solution for sparse-view CBCT reconstruction.
The desired attenuation coefficients are represented as a continuous function of 3D spatial coordinates, parameterized by a fully-connected deep neural network.
A learning-based encoder entailing hash coding is adopted to help the network capture high-frequency details.
arXiv Detail & Related papers (2022-09-29T04:06:00Z) - Conditioning Trick for Training Stable GANs [70.15099665710336]
We propose a conditioning trick, called difference departure from normality, applied on the generator network in response to instability issues during GAN training.
We force the generator to get closer to the departure from normality function of real samples computed in the spectral domain of Schur decomposition.
arXiv Detail & Related papers (2020-10-12T16:50:22Z) - 3D Solid Spherical Bispectrum CNNs for Biomedical Texture Analysis [3.579867431007686]
Locally Rotation Invariant (LRI) operators have shown great potential in biomedical texture analysis.
We investigate the benefits of using the bispectrum over the spectrum in the design of a LRI layer embedded in a shallow Convolutional Neural Network (CNN) for 3D image analysis.
arXiv Detail & Related papers (2020-04-28T09:01:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.