Exploring Text-Guided Single Image Editing for Remote Sensing Images
- URL: http://arxiv.org/abs/2405.05769v1
- Date: Thu, 9 May 2024 13:45:04 GMT
- Title: Exploring Text-Guided Single Image Editing for Remote Sensing Images
- Authors: Fangzhou Han, Lingyu Si, Hongwei Dong, Lamei Zhang, Hao Chen, Bo Du,
- Abstract summary: This letter proposes a diffusion based method to fulfill stable and controllable remote sensing image editing with text guidance.
Our method avoids the use of a large number of paired image, and can achieve good image editing results using only a single image.
- Score: 30.23541304590692
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Artificial Intelligence Generative Content (AIGC) technologies have significantly influenced the remote sensing domain, particularly in the realm of image generation. However, remote sensing image editing, an equally vital research area, has not garnered sufficient attention. Different from text-guided editing in natural images, which relies on extensive text-image paired data for semantic correlation, the application scenarios of remote sensing image editing are often extreme, such as forest on fire, so it is difficult to obtain sufficient paired samples. At the same time, the lack of remote sensing semantics and the ambiguity of text also restrict the further application of image editing in remote sensing field. To solve above problems, this letter proposes a diffusion based method to fulfill stable and controllable remote sensing image editing with text guidance. Our method avoids the use of a large number of paired image, and can achieve good image editing results using only a single image. The quantitative evaluation system including CLIP score and subjective evaluation metrics shows that our method has better editing effect on remote sensing images than the existing image editing model.
Related papers
- A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models [117.77807994397784]
Image editing aims to edit the given synthetic or real image to meet the specific requirements from users.
Recent significant advancement in this field is based on the development of text-to-image (T2I) diffusion models.
T2I-based image editing methods significantly enhance editing performance and offer a user-friendly interface for modifying content guided by multimodal inputs.
arXiv Detail & Related papers (2024-06-20T17:58:52Z) - Knowledge-aware Text-Image Retrieval for Remote Sensing Images [6.4527372338977]
Cross-modal text-image retrieval often suffers from information asymmetry between texts and images.
By mining relevant information from an external knowledge graph, we propose a Knowledge-aware Text-Image Retrieval.
We show that the proposed knowledge-aware method leads to varied and consistent retrievals, outperforming state-of-the-art retrieval methods.
arXiv Detail & Related papers (2024-05-06T11:27:27Z) - Ground-A-Score: Scaling Up the Score Distillation for Multi-Attribute Editing [49.419619882284906]
Ground-A-Score is a powerful model-agnostic image editing method by incorporating grounding during score distillation.
The selective application with a new penalty coefficient and contrastive loss helps to precisely target editing areas.
Both qualitative assessments and quantitative analyses confirm that Ground-A-Score successfully adheres to the intricate details of extended and multifaceted prompts.
arXiv Detail & Related papers (2024-03-20T12:40:32Z) - ZONE: Zero-Shot Instruction-Guided Local Editing [56.56213730578504]
We propose a Zero-shot instructiON-guided local image Editing approach, termed ZONE.
We first convert the editing intent from the user-provided instruction into specific image editing regions through InstructPix2Pix.
We then propose a Region-IoU scheme for precise image layer extraction from an off-the-shelf segment model.
arXiv Detail & Related papers (2023-12-28T02:54:34Z) - Bootstrapping Interactive Image-Text Alignment for Remote Sensing Image
Captioning [49.48946808024608]
We propose a novel two-stage vision-language pre-training-based approach to bootstrap interactive image-text alignment for remote sensing image captioning, called BITA.
Specifically, the first stage involves preliminary alignment through image-text contrastive learning.
In the second stage, the interactive Fourier Transformer connects the frozen image encoder with a large language model.
arXiv Detail & Related papers (2023-12-02T17:32:17Z) - Dynamic Prompt Learning: Addressing Cross-Attention Leakage for
Text-Based Image Editing [23.00202969969574]
We propose Dynamic Prompt Learning (DPL) to force cross-attention maps to focus on correct noun words in the text prompt.
We show improved prompt editing results for Word-Swap, Prompt Refinement, and Attention Re-weighting, especially for complex multi-object scenes.
arXiv Detail & Related papers (2023-09-27T13:55:57Z) - Zero-shot Inversion Process for Image Attribute Editing with Diffusion
Models [9.924851219904843]
We propose a framework that injects a fusion of generated visual reference and text guidance into the semantic latent space of a pre-trained diffusion model.
Only using a tiny neural network, the proposed ZIP produces diverse content and attributes under the intuitive control of the text prompt.
Compared to state-of-the-art methods, ZIP produces images of equivalent quality while providing a realistic editing effect.
arXiv Detail & Related papers (2023-08-30T08:40:15Z) - iEdit: Localised Text-guided Image Editing with Weak Supervision [53.082196061014734]
We propose a novel learning method for text-guided image editing.
It generates images conditioned on a source image and a textual edit prompt.
It shows favourable results against its counterparts in terms of image fidelity, CLIP alignment score and qualitatively for editing both generated and real images.
arXiv Detail & Related papers (2023-05-10T07:39:14Z) - DiffEdit: Diffusion-based semantic image editing with mask guidance [64.555930158319]
DiffEdit is a method to take advantage of text-conditioned diffusion models for the task of semantic image editing.
Our main contribution is able to automatically generate a mask highlighting regions of the input image that need to be edited.
arXiv Detail & Related papers (2022-10-20T17:16:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.