Eliminating Contextual Prior Bias for Semantic Image Editing via
Dual-Cycle Diffusion
- URL: http://arxiv.org/abs/2302.02394v3
- Date: Thu, 5 Oct 2023 14:35:08 GMT
- Title: Eliminating Contextual Prior Bias for Semantic Image Editing via
Dual-Cycle Diffusion
- Authors: Zuopeng Yang, Tianshu Chu, Xin Lin, Erdun Gao, Daqing Liu, Jie Yang,
Chaoyue Wang
- Abstract summary: A novel approach called Dual-Cycle Diffusion generates an unbiased mask to guide image editing.
Our experiments demonstrate the effectiveness of the proposed method, as it significantly improves the D-CLIP score from 0.272 to 0.283.
- Score: 35.95513392917737
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The recent success of text-to-image generation diffusion models has also
revolutionized semantic image editing, enabling the manipulation of images
based on query/target texts. Despite these advancements, a significant
challenge lies in the potential introduction of contextual prior bias in
pre-trained models during image editing, e.g., making unexpected modifications
to inappropriate regions. To address this issue, we present a novel approach
called Dual-Cycle Diffusion, which generates an unbiased mask to guide image
editing. The proposed model incorporates a Bias Elimination Cycle that consists
of both a forward path and an inverted path, each featuring a Structural
Consistency Cycle to ensure the preservation of image content during the
editing process. The forward path utilizes the pre-trained model to produce the
edited image, while the inverted path converts the result back to the source
image. The unbiased mask is generated by comparing differences between the
processed source image and the edited image to ensure that both conform to the
same distribution. Our experiments demonstrate the effectiveness of the
proposed method, as it significantly improves the D-CLIP score from 0.272 to
0.283. The code will be available at
https://github.com/JohnDreamer/DualCycleDiffsion.
Related papers
- CODE: Confident Ordinary Differential Editing [62.83365660727034]
Confident Ordinary Differential Editing (CODE) is a novel approach for image synthesis that effectively handles Out-of-Distribution (OoD) guidance images.
CODE enhances images through score-based updates along the probability-flow Ordinary Differential Equation (ODE) trajectory.
Our method operates in a fully blind manner, relying solely on a pre-trained generative model.
arXiv Detail & Related papers (2024-08-22T14:12:20Z) - TurboEdit: Text-Based Image Editing Using Few-Step Diffusion Models [53.757752110493215]
We focus on a popular line of text-based editing frameworks - the edit-friendly'' DDPM-noise inversion approach.
We analyze its application to fast sampling methods and categorize its failures into two classes: the appearance of visual artifacts, and insufficient editing strength.
We propose a pseudo-guidance approach that efficiently increases the magnitude of edits without introducing new artifacts.
arXiv Detail & Related papers (2024-08-01T17:27:28Z) - Enhancing Text-to-Image Editing via Hybrid Mask-Informed Fusion [61.42732844499658]
This paper systematically improves the text-guided image editing techniques based on diffusion models.
We incorporate human annotation as an external knowledge to confine editing within a Mask-informed'' region.
arXiv Detail & Related papers (2024-05-24T07:53:59Z) - Source Prompt Disentangled Inversion for Boosting Image Editability with Diffusion Models [18.75409092764653]
One crucial step in text-driven image editing is to invert the original image into a latent noise code conditioned on the source prompt.
We propose a novel method called Source Prompt Disentangled Inversion (SPDInv), which aims at reducing the impact of source prompt.
The experimental results show that our proposed SPDInv method can effectively mitigate the conflicts between the target editing prompt and the source prompt.
arXiv Detail & Related papers (2024-03-17T06:19:30Z) - Perceptual Similarity guidance and text guidance optimization for
Editing Real Images using Guided Diffusion Models [0.6345523830122168]
We apply a dual-guidance approach to maintain high fidelity to the original in areas that are not altered.
This method ensures the realistic rendering of both the edited elements and the preservation of the unedited parts of the original image.
arXiv Detail & Related papers (2023-12-09T02:55:35Z) - Improving Diffusion-based Image Translation using Asymmetric Gradient
Guidance [51.188396199083336]
We present an approach that guides the reverse process of diffusion sampling by applying asymmetric gradient guidance.
Our model's adaptability allows it to be implemented with both image-fusion and latent-dif models.
Experiments show that our method outperforms various state-of-the-art models in image translation tasks.
arXiv Detail & Related papers (2023-06-07T12:56:56Z) - iEdit: Localised Text-guided Image Editing with Weak Supervision [53.082196061014734]
We propose a novel learning method for text-guided image editing.
It generates images conditioned on a source image and a textual edit prompt.
It shows favourable results against its counterparts in terms of image fidelity, CLIP alignment score and qualitatively for editing both generated and real images.
arXiv Detail & Related papers (2023-05-10T07:39:14Z) - Pix2Video: Video Editing using Image Diffusion [43.07444438561277]
We investigate how to use pre-trained image models for text-guided video editing.
Our method works in two simple steps: first, we use a pre-trained structure-guided (e.g., depth) image diffusion model to perform text-guided edits on an anchor frame.
We demonstrate that realistic text-guided video edits are possible, without any compute-intensive preprocessing or video-specific finetuning.
arXiv Detail & Related papers (2023-03-22T16:36:10Z) - Zero-shot Image-to-Image Translation [57.46189236379433]
We propose pix2pix-zero, an image-to-image translation method that can preserve the original image without manual prompting.
We propose cross-attention guidance, which aims to retain the cross-attention maps of the input image throughout the diffusion process.
Our method does not need additional training for these edits and can directly use the existing text-to-image diffusion model.
arXiv Detail & Related papers (2023-02-06T18:59:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.