CrossDiff: Exploring Self-Supervised Representation of Pansharpening via
Cross-Predictive Diffusion Model
- URL: http://arxiv.org/abs/2401.05153v2
- Date: Sat, 13 Jan 2024 06:35:34 GMT
- Title: CrossDiff: Exploring Self-Supervised Representation of Pansharpening via
Cross-Predictive Diffusion Model
- Authors: Yinghui Xing, Litao Qu, Shizhou Zhang, Kai Zhang, Yanning Zhang
- Abstract summary: Fusion of a panchromatic (PAN) image and corresponding multispectral (MS) image is also known as pansharpening.
Due to the absence of high-resolution MS images, available deep-learning-based methods usually follow the paradigm of training at reduced resolution and testing at both reduced and full resolution.
We propose to explore the self-supervised representation of pansharpening by designing a cross-predictive diffusion model, named CrossDiff.
- Score: 42.39485365164292
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Fusion of a panchromatic (PAN) image and corresponding multispectral (MS)
image is also known as pansharpening, which aims to combine abundant spatial
details of PAN and spectral information of MS. Due to the absence of
high-resolution MS images, available deep-learning-based methods usually follow
the paradigm of training at reduced resolution and testing at both reduced and
full resolution. When taking original MS and PAN images as inputs, they always
obtain sub-optimal results due to the scale variation. In this paper, we
propose to explore the self-supervised representation of pansharpening by
designing a cross-predictive diffusion model, named CrossDiff. It has two-stage
training. In the first stage, we introduce a cross-predictive pretext task to
pre-train the UNet structure based on conditional DDPM, while in the second
stage, the encoders of the UNets are frozen to directly extract spatial and
spectral features from PAN and MS, and only the fusion head is trained to adapt
for pansharpening task. Extensive experiments show the effectiveness and
superiority of the proposed model compared with state-of-the-art supervised and
unsupervised methods. Besides, the cross-sensor experiments also verify the
generalization ability of proposed self-supervised representation learners for
other satellite's datasets. We will release our code for reproducibility.
Related papers
- Multi-Head Attention Residual Unfolded Network for Model-Based Pansharpening [2.874893537471256]
Unfolding fusion methods integrate the powerful representation capabilities of deep learning with the robustness of model-based approaches.
In this paper, we propose a model-based deep unfolded method for satellite image fusion.
Experimental results on PRISMA, Quickbird, and WorldView2 datasets demonstrate the superior performance of our method.
arXiv Detail & Related papers (2024-09-04T13:05:00Z) - Improving Misaligned Multi-modality Image Fusion with One-stage
Progressive Dense Registration [67.23451452670282]
Misalignments between multi-modality images pose challenges in image fusion.
We propose a Cross-modality Multi-scale Progressive Dense Registration scheme.
This scheme accomplishes the coarse-to-fine registration exclusively using a one-stage optimization.
arXiv Detail & Related papers (2023-08-22T03:46:24Z) - Unsupervised Hyperspectral Pansharpening via Low-rank Diffusion Model [43.71116483554516]
Hyperspectral pansharpening is a process of merging a high-resolution panchromatic (PAN) image and a low-resolution hyperspectral (LRHS) image to create a single high-resolution hyperspectral (HRHS) image.
Existing Bayesian-based HS pansharpening methods require designing handcraft image prior to characterize the image features.
We propose a low-rank diffusion model for hyperspectral pansharpening by simultaneously leveraging the power of the pre-trained deep diffusion model and better generalization ability of Bayesian methods.
arXiv Detail & Related papers (2023-05-18T12:38:29Z) - Scale Attention for Learning Deep Face Representation: A Study Against
Visual Scale Variation [69.45176408639483]
We reform the conv layer by resorting to the scale-space theory.
We build a novel style named SCale AttentioN Conv Neural Network (textbfSCAN-CNN)
As a single-shot scheme, the inference is more efficient than multi-shot fusion.
arXiv Detail & Related papers (2022-09-19T06:35:04Z) - PC-GANs: Progressive Compensation Generative Adversarial Networks for
Pan-sharpening [50.943080184828524]
We propose a novel two-step model for pan-sharpening that sharpens the MS image through the progressive compensation of the spatial and spectral information.
The whole model is composed of triple GANs, and based on the specific architecture, a joint compensation loss function is designed to enable the triple GANs to be trained simultaneously.
arXiv Detail & Related papers (2022-07-29T03:09:21Z) - Unsupervised Cycle-consistent Generative Adversarial Networks for
Pan-sharpening [41.68141846006704]
We propose an unsupervised generative adversarial framework that learns from the full-scale images without the ground truths to alleviate this problem.
We extract the modality-specific features from the PAN and MS images with a two-stream generator, perform fusion in the feature domain, and then reconstruct the pan-sharpened images.
Results demonstrate that the proposed method can greatly improve the pan-sharpening performance on the full-scale images.
arXiv Detail & Related papers (2021-09-20T09:43:24Z) - Hyperspectral Pansharpening Based on Improved Deep Image Prior and
Residual Reconstruction [64.10636296274168]
Hyperspectral pansharpening aims to synthesize a low-resolution hyperspectral image (LR-HSI) with a registered panchromatic image (PAN) to generate an enhanced HSI with high spectral and spatial resolution.
Recently proposed HS pansharpening methods have obtained remarkable results using deep convolutional networks (ConvNets)
We propose a novel over-complete network, called HyperKite, which focuses on learning high-level features by constraining the receptive from increasing in the deep layers.
arXiv Detail & Related papers (2021-07-06T14:11:03Z) - PGMAN: An Unsupervised Generative Multi-adversarial Network for
Pan-sharpening [46.84573725116611]
We propose an unsupervised framework that learns directly from the full-resolution images without any preprocessing.
We use a two-stream generator to extract the modality-specific features from the PAN and MS images, respectively, and develop a dual-discriminator to preserve the spectral and spatial information of the inputs when performing fusion.
arXiv Detail & Related papers (2020-12-16T16:21:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.