Multi-Sensor Diffusion-Driven Optical Image Translation for Large-Scale Applications
- URL: http://arxiv.org/abs/2404.11243v4
- Date: Wed, 04 Dec 2024 11:23:37 GMT
- Title: Multi-Sensor Diffusion-Driven Optical Image Translation for Large-Scale Applications
- Authors: João Gabriel Vinholi, Marco Chini, Anis Amziane, Renato Machado, Danilo Silva, Patrick Matgen,
- Abstract summary: We propose a method that super-resolves large-scale low spatial resolution images into high-resolution equivalents from disparate optical sensors.<n>Our approach provides precise domain adaptation, preserving image content while improving radiometric accuracy and feature representation.<n>We reach a mean Learned Perceptual Image Patch Similarity (mLPIPS) of 0.1884 and a Fr'echet Inception Distance (FID) of 45.64, expressively outperforming all compared methods.
- Score: 3.4085512042262374
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Comparing images captured by disparate sensors is a common challenge in remote sensing. This requires image translation -- converting imagery from one sensor domain to another while preserving the original content. Denoising Diffusion Implicit Models (DDIM) are potential state-of-the-art solutions for such domain translation due to their proven superiority in multiple image-to-image translation tasks in computer vision. However, these models struggle with reproducing radiometric features of large-scale multi-patch imagery, resulting in inconsistencies across the full image. This renders downstream tasks like Heterogeneous Change Detection impractical. To overcome these limitations, we propose a method that leverages denoising diffusion for effective multi-sensor optical image translation over large areas. Our approach super-resolves large-scale low spatial resolution images into high-resolution equivalents from disparate optical sensors, ensuring uniformity across hundreds of patches. Our contributions lie in new forward and reverse diffusion processes that address the challenges of large-scale image translation. Extensive experiments using paired Sentinel-II (10m) and Planet Dove (3m) images demonstrate that our approach provides precise domain adaptation, preserving image content while improving radiometric accuracy and feature representation. A thorough image quality assessment and comparisons with the standard DDIM framework and five other leading methods are presented. We reach a mean Learned Perceptual Image Patch Similarity (mLPIPS) of 0.1884 and a Fr\'echet Inception Distance (FID) of 45.64, expressively outperforming all compared methods, including DDIM, ShuffleMixer, and SwinIR. The usefulness of our approach is further demonstrated in two Heterogeneous Change Detection tasks.
Related papers
- OSDM-MReg: Multimodal Image Registration based One Step Diffusion Model [8.619958921346184]
Multimodal remote sensing image registration aligns images from different sensors for data fusion and analysis.
We propose OSDM-MReg, a novel multimodal image registration framework based image-to-image translation.
Experiments demonstrate superior accuracy and efficiency across various multimodal registration tasks.
arXiv Detail & Related papers (2025-04-08T13:32:56Z) - MODEL&CO: Exoplanet detection in angular differential imaging by learning across multiple observations [37.845442465099396]
Most post-processing methods build a model of the nuisances from the target observations themselves.
We propose to build the nuisance model from an archive of multiple observations by leveraging supervised deep learning techniques.
We apply the proposed algorithm to several datasets from the VLT/SPHERE instrument, and demonstrate a superior precision-recall trade-off.
arXiv Detail & Related papers (2024-09-23T09:22:45Z) - Cross-Domain Separable Translation Network for Multimodal Image Change Detection [11.25422609271201]
multimodal change detection (MCD) is particularly critical in the remote sensing community.
This paper focuses on addressing the challenges of MCD, especially the difficulty in comparing images from different sensors.
A novel unsupervised cross-domain separable translation network (CSTN) is proposed to overcome these limitations.
arXiv Detail & Related papers (2024-07-23T03:56:02Z) - Accelerating Diffusion for SAR-to-Optical Image Translation via Adversarial Consistency Distillation [5.234109158596138]
We propose a new training framework for SAR-to-optical image translation.
Our method employs consistency distillation to reduce iterative inference steps and integrates adversarial learning to ensure image clarity and minimize color shifts.
The results demonstrate that our approach significantly improves inference speed by 131 times while maintaining the visual quality of the generated images.
arXiv Detail & Related papers (2024-07-08T16:36:12Z) - Rethinking Score Distillation as a Bridge Between Image Distributions [97.27476302077545]
We show that our method seeks to transport corrupted images (source) to the natural image distribution (target)
Our method can be easily applied across many domains, matching or beating the performance of specialized methods.
We demonstrate its utility in text-to-2D, text-based NeRF optimization, translating paintings to real images, optical illusion generation, and 3D sketch-to-real.
arXiv Detail & Related papers (2024-06-13T17:59:58Z) - Semantic Guided Large Scale Factor Remote Sensing Image Super-resolution with Generative Diffusion Prior [13.148815217684277]
Large scale factor super-resolution (SR) algorithms are vital for maximizing the utilization of low-resolution (LR) satellite data captured from orbit.
Existing methods confront challenges in recovering SR images with clear textures and correct ground objects.
We introduce a novel framework, the Semantic Guided Diffusion Model (SGDM), designed for large scale factor remote sensing image super-resolution.
arXiv Detail & Related papers (2024-05-11T16:06:16Z) - Arbitrary-Scale Image Generation and Upsampling using Latent Diffusion Model and Implicit Neural Decoder [29.924160271522354]
Super-resolution (SR) and image generation are important tasks in computer vision and are widely adopted in real-world applications.
Most existing methods, however, generate images only at fixed-scale magnification and suffer from over-smoothing and artifacts.
Most relevant work applied Implicit Neural Representation (INR) to the denoising diffusion model to obtain continuous-resolution yet diverse and high-quality SR results.
We propose a novel pipeline that can super-resolve an input image or generate from a random noise a novel image at arbitrary scales.
arXiv Detail & Related papers (2024-03-15T12:45:40Z) - DiAD: A Diffusion-based Framework for Multi-class Anomaly Detection [55.48770333927732]
We propose a Difusion-based Anomaly Detection (DiAD) framework for multi-class anomaly detection.
It consists of a pixel-space autoencoder, a latent-space Semantic-Guided (SG) network with a connection to the stable diffusion's denoising network, and a feature-space pre-trained feature extractor.
Experiments on MVTec-AD and VisA datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-12-11T18:38:28Z) - Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing [58.48890547818074]
We present a powerful modification of Contrastive Denoising Score (CUT) for latent diffusion models (LDM)
Our approach enables zero-shot imageto-image translation and neural field (NeRF) editing, achieving structural correspondence between the input and output.
arXiv Detail & Related papers (2023-11-30T15:06:10Z) - Domain Transfer in Latent Space (DTLS) Wins on Image Super-Resolution --
a Non-Denoising Model [13.326634982790528]
We propose a simple approach which gets away from using Gaussian noise but adopts some basic structures of diffusion models for efficient image super-resolution.
Experimental results show that our method outperforms not only state-of-the-art large scale super resolution models, but also the current diffusion models for image super-resolution.
arXiv Detail & Related papers (2023-11-04T09:57:50Z) - Denoising Diffusion Models for Plug-and-Play Image Restoration [135.6359475784627]
This paper proposes DiffPIR, which integrates the traditional plug-and-play method into the diffusion sampling framework.
Compared to plug-and-play IR methods that rely on discriminative Gaussian denoisers, DiffPIR is expected to inherit the generative ability of diffusion models.
arXiv Detail & Related papers (2023-05-15T20:24:38Z) - Decoupled-and-Coupled Networks: Self-Supervised Hyperspectral Image
Super-Resolution with Subpixel Fusion [67.35540259040806]
We propose a subpixel-level HS super-resolution framework by devising a novel decoupled-and-coupled network, called DCNet.
As the name suggests, DC-Net first decouples the input into common (or cross-sensor) and sensor-specific components.
We append a self-supervised learning module behind the CSU net by guaranteeing the material consistency to enhance the detailed appearances of the restored HS product.
arXiv Detail & Related papers (2022-05-07T23:40:36Z) - A Hierarchical Transformation-Discriminating Generative Model for Few
Shot Anomaly Detection [93.38607559281601]
We devise a hierarchical generative model that captures the multi-scale patch distribution of each training image.
The anomaly score is obtained by aggregating the patch-based votes of the correct transformation across scales and image regions.
arXiv Detail & Related papers (2021-04-29T17:49:48Z) - Boosting Image Super-Resolution Via Fusion of Complementary Information
Captured by Multi-Modal Sensors [21.264746234523678]
Image Super-Resolution (SR) provides a promising technique to enhance the image quality of low-resolution optical sensors.
In this paper, we attempt to leverage complementary information from a low-cost channel (visible/depth) to boost image quality of an expensive channel (thermal) using fewer parameters.
arXiv Detail & Related papers (2020-12-07T02:15:28Z) - Learning Enriched Features for Real Image Restoration and Enhancement [166.17296369600774]
convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task.
We present a novel architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network.
Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
arXiv Detail & Related papers (2020-03-15T11:04:30Z) - DeepEMD: Differentiable Earth Mover's Distance for Few-Shot Learning [122.51237307910878]
We develop methods for few-shot image classification from a new perspective of optimal matching between image regions.
We employ the Earth Mover's Distance (EMD) as a metric to compute a structural distance between dense image representations.
To generate the important weights of elements in the formulation, we design a cross-reference mechanism.
arXiv Detail & Related papers (2020-03-15T08:13:16Z) - PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of
Generative Models [77.32079593577821]
PULSE (Photo Upsampling via Latent Space Exploration) generates high-resolution, realistic images at resolutions previously unseen in the literature.
Our method outperforms state-of-the-art methods in perceptual quality at higher resolutions and scale factors than previously possible.
arXiv Detail & Related papers (2020-03-08T16:44:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.