Remote Diffusion
- URL: http://arxiv.org/abs/2405.04717v1
- Date: Tue, 7 May 2024 23:44:09 GMT
- Title: Remote Diffusion
- Authors: Kunal Sunil Kasodekar,
- Abstract summary: I explored adapting Stable Diffusion v1.5 for generating domain-specific satellite and aerial images in remote sensing.
I used the RSICD dataset to train a Stable Diffusion model with a loss of 0.2.
I created a synthetic dataset for a Land Use Land Classification (LULC) task, employing prompting techniques with RAG and ChatGPT.
Despite extensive fine-tuning and dataset iterations, results indicated subpar image quality and realism, as indicated by high FID scores and domain-expert evaluation.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: I explored adapting Stable Diffusion v1.5 for generating domain-specific satellite and aerial images in remote sensing. Recognizing the limitations of existing models like Midjourney and Stable Diffusion, trained primarily on natural RGB images and lacking context for remote sensing, I used the RSICD dataset to train a Stable Diffusion model with a loss of 0.2. I incorporated descriptive captions from the dataset for text-conditioning. Additionally, I created a synthetic dataset for a Land Use Land Classification (LULC) task, employing prompting techniques with RAG and ChatGPT and fine-tuning a specialized remote sensing LLM. However, I faced challenges with prompt quality and model performance. I trained a classification model (ResNet18) on the synthetic dataset achieving 49.48% test accuracy in TorchGeo to create a baseline. Quantitative evaluation through FID scores and qualitative feedback from domain experts assessed the realism and quality of the generated images and dataset. Despite extensive fine-tuning and dataset iterations, results indicated subpar image quality and realism, as indicated by high FID scores and domain-expert evaluation. These findings call attention to the potential of diffusion models in remote sensing while highlighting significant challenges related to insufficient pretraining data and computational resources.
Related papers
- RSFAKE-1M: A Large-Scale Dataset for Detecting Diffusion-Generated Remote Sensing Forgeries [33.844219602982555]
We introduce RSFAKE-1M, a large-scale dataset of 500K forged and 500K real remote sensing images.<n>The fake images are generated by ten diffusion models fine-tuned on remote sensing data.<n>The results reveal that diffusion-based remote sensing forgeries remain challenging for current methods.
arXiv Detail & Related papers (2025-05-29T09:30:46Z) - Prompt-Driven and Training-Free Forgetting Approach and Dataset for Large Language Models [4.824120664293887]
We propose an Automatic dataset Creation Framework based on prompt-based layered editing and training-free local feature removal.
The ForgetMe dataset encompasses a diverse set of real and synthetic scenarios, including CUB-200-2011 (Birds), Stanford-Dogs, ImageNet, and a synthetic cat dataset.
We apply LoRA fine-tuning on Stable Diffusion to achieve selective unlearning on this dataset and validate the effectiveness of both the ForgetMe dataset and the Entangled metric.
arXiv Detail & Related papers (2025-04-17T01:44:57Z) - Human Body Restoration with One-Step Diffusion Model and A New Benchmark [74.66514054623669]
We propose a high-quality dataset automated cropping and filtering (HQ-ACF) pipeline.
This pipeline leverages existing object detection datasets and other unlabeled images to automatically crop and filter high-quality human images.
We also propose emphOSDHuman, a novel one-step diffusion model for human body restoration.
arXiv Detail & Related papers (2025-02-03T14:48:40Z) - Super-resolving Real-world Image Illumination Enhancement: A New Dataset and A Conditional Diffusion Model [43.93772529301279]
We propose a SRRIIE dataset with an efficient conditional diffusion probabilistic models-based method.
We capture images using an ILDC camera and an optical zoom lens with exposure levels ranging from -6 EV to 0 EV and ISO levels ranging from 50 to 12800.
We show that most existing methods are less effective in preserving the structures and sharpness of restored images from complicated noises.
arXiv Detail & Related papers (2024-10-16T18:47:04Z) - Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities [88.398085358514]
Contrastive Deepfake Embeddings (CoDE) is a novel embedding space specifically designed for deepfake detection.
CoDE is trained via contrastive learning by additionally enforcing global-local similarities.
arXiv Detail & Related papers (2024-07-29T18:00:10Z) - DiffusionSat: A Generative Foundation Model for Satellite Imagery [63.2807119794691]
We present DiffusionSat, to date the largest generative foundation model trained on a collection of publicly available large, high-resolution remote sensing datasets.
Our method produces realistic samples and can be used to solve multiple generative tasks including temporal generation, superresolution given multi-spectral inputs and in-painting.
arXiv Detail & Related papers (2023-12-06T16:53:17Z) - Innovative Horizons in Aerial Imagery: LSKNet Meets DiffusionDet for
Advanced Object Detection [55.2480439325792]
We present an in-depth evaluation of an object detection model that integrates the LSKNet backbone with the DiffusionDet head.
The proposed model achieves a mean average precision (MAP) of approximately 45.7%, which is a significant improvement.
This advancement underscores the effectiveness of the proposed modifications and sets a new benchmark in aerial image analysis.
arXiv Detail & Related papers (2023-11-21T19:49:13Z) - Thermal-Infrared Remote Target Detection System for Maritime Rescue
based on Data Augmentation with 3D Synthetic Data [4.66313002591741]
This paper proposes a thermal-infrared (TIR) remote target detection system for maritime rescue using deep learning and data augmentation.
To address dataset scarcity and improve model robustness, a synthetic dataset from a 3D game (ARMA3) to augment the data is collected.
The proposed segmentation model surpasses the performance of state-of-the-art segmentation methods.
arXiv Detail & Related papers (2023-10-31T12:37:49Z) - SatDM: Synthesizing Realistic Satellite Image with Semantic Layout
Conditioning using Diffusion Models [0.0]
Denoising Diffusion Probabilistic Models (DDPMs) have demonstrated significant promise in synthesizing realistic images from semantic layouts.
In this paper, a conditional DDPM model capable of taking a semantic map and generating high-quality, diverse, and correspondingly accurate satellite images is implemented.
The effectiveness of our proposed model is validated using a meticulously labeled dataset introduced within the context of this study.
arXiv Detail & Related papers (2023-09-28T19:39:13Z) - Diffusion Models for Interferometric Satellite Aperture Radar [73.01013149014865]
Probabilistic Diffusion Models (PDMs) have recently emerged as a very promising class of generative models.
Here, we leverage PDMs to generate several radar-based satellite image datasets.
We show that PDMs succeed in generating images with complex and realistic structures, but that sampling time remains an issue.
arXiv Detail & Related papers (2023-08-31T16:26:17Z) - LARD - Landing Approach Runway Detection -- Dataset for Vision Based
Landing [2.7400353551392853]
We present a dataset of high-quality aerial images for the task of runway detection during approach and landing phases.
Most of the dataset is composed of synthetic images but we also provide manually labelled images from real landing footages.
This dataset paves the way for further research such as the analysis of dataset quality or the development of models to cope with the detection tasks.
arXiv Detail & Related papers (2023-04-05T08:25:55Z) - DeepDC: Deep Distance Correlation as a Perceptual Image Quality
Evaluator [53.57431705309919]
ImageNet pre-trained deep neural networks (DNNs) show notable transferability for building effective image quality assessment (IQA) models.
We develop a novel full-reference IQA (FR-IQA) model based exclusively on pre-trained DNN features.
We conduct comprehensive experiments to demonstrate the superiority of the proposed quality model on five standard IQA datasets.
arXiv Detail & Related papers (2022-11-09T14:57:27Z) - Learning to Simulate Realistic LiDARs [66.7519667383175]
We introduce a pipeline for data-driven simulation of a realistic LiDAR sensor.
We show that our model can learn to encode realistic effects such as dropped points on transparent surfaces.
We use our technique to learn models of two distinct LiDAR sensors and use them to improve simulated LiDAR data accordingly.
arXiv Detail & Related papers (2022-09-22T13:12:54Z) - Learning class prototypes from Synthetic InSAR with Vision Transformers [2.41710192205034]
Detection of early signs of volcanic unrest is critical for assessing volcanic hazard.
We propose a novel deep learning methodology that exploits a rich source of synthetically generated interferograms.
We report detection accuracy that surpasses the state of the art on volcanic unrest detection.
arXiv Detail & Related papers (2022-01-09T14:03:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.