Doubly Abductive Counterfactual Inference for Text-based Image Editing
- URL: http://arxiv.org/abs/2403.02981v2
- Date: Tue, 26 Mar 2024 02:39:15 GMT
- Title: Doubly Abductive Counterfactual Inference for Text-based Image Editing
- Authors: Xue Song, Jiequan Cui, Hanwang Zhang, Jingjing Chen, Richang Hong, Yu-Gang Jiang,
- Abstract summary: We study text-based image editing (TBIE) of a single image by counterfactual inference.
We propose a Doubly Abductive Counterfactual inference framework (DAC)
Our DAC achieves a good trade-off between editability and fidelity.
- Score: 130.46583155383735
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study text-based image editing (TBIE) of a single image by counterfactual inference because it is an elegant formulation to precisely address the requirement: the edited image should retain the fidelity of the original one. Through the lens of the formulation, we find that the crux of TBIE is that existing techniques hardly achieve a good trade-off between editability and fidelity, mainly due to the overfitting of the single-image fine-tuning. To this end, we propose a Doubly Abductive Counterfactual inference framework (DAC). We first parameterize an exogenous variable as a UNet LoRA, whose abduction can encode all the image details. Second, we abduct another exogenous variable parameterized by a text encoder LoRA, which recovers the lost editability caused by the overfitted first abduction. Thanks to the second abduction, which exclusively encodes the visual transition from post-edit to pre-edit, its inversion -- subtracting the LoRA -- effectively reverts pre-edit back to post-edit, thereby accomplishing the edit. Through extensive experiments, our DAC achieves a good trade-off between editability and fidelity. Thus, we can support a wide spectrum of user editing intents, including addition, removal, manipulation, replacement, style transfer, and facial change, which are extensively validated in both qualitative and quantitative evaluations. Codes are in https://github.com/xuesong39/DAC.
Related papers
- CODE: Confident Ordinary Differential Editing [62.83365660727034]
Confident Ordinary Differential Editing (CODE) is a novel approach for image synthesis that effectively handles Out-of-Distribution (OoD) guidance images.
CODE enhances images through score-based updates along the probability-flow Ordinary Differential Equation (ODE) trajectory.
Our method operates in a fully blind manner, relying solely on a pre-trained generative model.
arXiv Detail & Related papers (2024-08-22T14:12:20Z) - TurboEdit: Text-Based Image Editing Using Few-Step Diffusion Models [53.757752110493215]
We focus on a popular line of text-based editing frameworks - the edit-friendly'' DDPM-noise inversion approach.
We analyze its application to fast sampling methods and categorize its failures into two classes: the appearance of visual artifacts, and insufficient editing strength.
We propose a pseudo-guidance approach that efficiently increases the magnitude of edits without introducing new artifacts.
arXiv Detail & Related papers (2024-08-01T17:27:28Z) - DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image
Editing [66.43179841884098]
Large-scale Text-to-Image (T2I) diffusion models have revolutionized image generation over the last few years.
We propose DiffEditor to rectify two weaknesses in existing diffusion-based image editing.
Our method can efficiently achieve state-of-the-art performance on various fine-grained image editing tasks.
arXiv Detail & Related papers (2024-02-04T18:50:29Z) - Inversion-Free Image Editing with Natural Language [18.373145158518135]
We present inversion-free editing (InfEdit), which allows for consistent and faithful editing for both rigid and non-rigid semantic changes.
InfEdit shows strong performance in various editing tasks and also maintains a seamless workflow (less than 3 seconds on one single A40), demonstrating the potential for real-time applications.
arXiv Detail & Related papers (2023-12-07T18:58:27Z) - Forgedit: Text Guided Image Editing via Learning and Forgetting [17.26772361532044]
We design a novel text-guided image editing method, named as Forgedit.
First, we propose a vision-language joint optimization framework capable of reconstructing the original image in 30 seconds.
Then, we propose a novel vector projection mechanism in text embedding space of Diffusion Models.
arXiv Detail & Related papers (2023-09-19T12:05:26Z) - Eliminating Contextual Prior Bias for Semantic Image Editing via
Dual-Cycle Diffusion [35.95513392917737]
A novel approach called Dual-Cycle Diffusion generates an unbiased mask to guide image editing.
Our experiments demonstrate the effectiveness of the proposed method, as it significantly improves the D-CLIP score from 0.272 to 0.283.
arXiv Detail & Related papers (2023-02-05T14:30:22Z) - Editing Out-of-domain GAN Inversion via Differential Activations [56.62964029959131]
We propose a novel GAN prior based editing framework to tackle the out-of-domain inversion problem with a composition-decomposition paradigm.
With the aid of the generated Diff-CAM mask, a coarse reconstruction can intuitively be composited by the paired original and edited images.
In the decomposition phase, we further present a GAN prior based deghosting network for separating the final fine edited image from the coarse reconstruction.
arXiv Detail & Related papers (2022-07-17T10:34:58Z) - High-Fidelity GAN Inversion for Image Attribute Editing [61.966946442222735]
We present a novel high-fidelity generative adversarial network (GAN) inversion framework that enables attribute editing with image-specific details well-preserved.
With a low bit-rate latent code, previous works have difficulties in preserving high-fidelity details in reconstructed and edited images.
We propose a distortion consultation approach that employs a distortion map as a reference for high-fidelity reconstruction.
arXiv Detail & Related papers (2021-09-14T11:23:48Z) - From Continuity to Editability: Inverting GANs with Consecutive Images [37.16137384683823]
Existing GAN inversion methods are stuck in a paradox that the inverted codes can either achieve high-fidelity reconstruction, or retain the editing capability.
In this paper, we resolve this paradox by introducing consecutive images into the inversion process.
Our method provides the first support of video-based GAN inversion, and an interesting application of unsupervised semantic transfer from consecutive images.
arXiv Detail & Related papers (2021-07-29T08:19:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.