Related papers: Doubly Abductive Counterfactual Inference for Text-based Image Editing

Doubly Abductive Counterfactual Inference for Text-based Image Editing

URL: http://arxiv.org/abs/2403.02981v2
Date: Tue, 26 Mar 2024 02:39:15 GMT
Title: Doubly Abductive Counterfactual Inference for Text-based Image Editing
Authors: Xue Song, Jiequan Cui, Hanwang Zhang, Jingjing Chen, Richang Hong, Yu-Gang Jiang,
Abstract summary: We study text-based image editing (TBIE) of a single image by counterfactual inference. We propose a Doubly Abductive Counterfactual inference framework (DAC) Our DAC achieves a good trade-off between editability and fidelity.
Score: 130.46583155383735
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study text-based image editing (TBIE) of a single image by counterfactual inference because it is an elegant formulation to precisely address the requirement: the edited image should retain the fidelity of the original one. Through the lens of the formulation, we find that the crux of TBIE is that existing techniques hardly achieve a good trade-off between editability and fidelity, mainly due to the overfitting of the single-image fine-tuning. To this end, we propose a Doubly Abductive Counterfactual inference framework (DAC). We first parameterize an exogenous variable as a UNet LoRA, whose abduction can encode all the image details. Second, we abduct another exogenous variable parameterized by a text encoder LoRA, which recovers the lost editability caused by the overfitted first abduction. Thanks to the second abduction, which exclusively encodes the visual transition from post-edit to pre-edit, its inversion -- subtracting the LoRA -- effectively reverts pre-edit back to post-edit, thereby accomplishing the edit. Through extensive experiments, our DAC achieves a good trade-off between editability and fidelity. Thus, we can support a wide spectrum of user editing intents, including addition, removal, manipulation, replacement, style transfer, and facial change, which are extensively validated in both qualitative and quantitative evaluations. Codes are in https://github.com/xuesong39/DAC.

Related papers

Concept Lancet: Image Editing with Compositional Representation Transplant [58.9421919837084]
Concept Lancet is a zero-shot plug-and-play framework for principled representation manipulation in image editing. We decompose the source input in the latent (text embedding or diffusion score) space as a sparse linear combination of the representations of the collected visual concepts. We perform a customized concept transplant process to impose the corresponding editing direction.
arXiv Detail & Related papers (2025-04-03T17:59:58Z)
Training-Free Text-Guided Image Editing with Visual Autoregressive Model [46.201510044410995]
We propose a novel text-guided image editing framework based on Visual AutoRegressive modeling. Our method eliminates the need for explicit inversion while ensuring precise and controlled modifications. Our framework operates in a training-free manner and achieves high-fidelity editing with faster inference speeds.
arXiv Detail & Related papers (2025-03-31T09:46:56Z)
UIP2P: Unsupervised Instruction-based Image Editing via Cycle Edit Consistency [69.33072075580483]
We propose an unsupervised model for instruction-based image editing that eliminates the need for ground-truth edited images during training. Our method addresses these challenges by introducing a novel editing mechanism called Cycle Edit Consistency ( CEC) CEC applies forward and backward edits in one training step and enforces consistency in image and attention spaces.
arXiv Detail & Related papers (2024-12-19T18:59:58Z)
Dual-Schedule Inversion: Training- and Tuning-Free Inversion for Real Image Editing [43.082008983889956]
Most diffusion model-based methods use DDIM Inversion as the first stage before editing. We propose a new inversion and sampling method named Dual-Schedule Inversion. We also design a classifier to adaptively combine Dual-Schedule Inversion with different editing methods for user-friendly image editing.
arXiv Detail & Related papers (2024-12-15T11:04:06Z)
CODE: Confident Ordinary Differential Editing [62.83365660727034]
Confident Ordinary Differential Editing (CODE) is a novel approach for image synthesis that effectively handles Out-of-Distribution (OoD) guidance images. CODE enhances images through score-based updates along the probability-flow Ordinary Differential Equation (ODE) trajectory. Our method operates in a fully blind manner, relying solely on a pre-trained generative model.
arXiv Detail & Related papers (2024-08-22T14:12:20Z)
TurboEdit: Text-Based Image Editing Using Few-Step Diffusion Models [53.757752110493215]
We focus on a popular line of text-based editing frameworks - the edit-friendly'' DDPM-noise inversion approach. We analyze its application to fast sampling methods and categorize its failures into two classes: the appearance of visual artifacts, and insufficient editing strength. We propose a pseudo-guidance approach that efficiently increases the magnitude of edits without introducing new artifacts.
arXiv Detail & Related papers (2024-08-01T17:27:28Z)
DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing [66.43179841884098]
Large-scale Text-to-Image (T2I) diffusion models have revolutionized image generation over the last few years. We propose DiffEditor to rectify two weaknesses in existing diffusion-based image editing. Our method can efficiently achieve state-of-the-art performance on various fine-grained image editing tasks.
arXiv Detail & Related papers (2024-02-04T18:50:29Z)
Inversion-Free Image Editing with Natural Language [18.373145158518135]
We present inversion-free editing (InfEdit), which allows for consistent and faithful editing for both rigid and non-rigid semantic changes. InfEdit shows strong performance in various editing tasks and also maintains a seamless workflow (less than 3 seconds on one single A40), demonstrating the potential for real-time applications.
arXiv Detail & Related papers (2023-12-07T18:58:27Z)
Forgedit: Text Guided Image Editing via Learning and Forgetting [17.26772361532044]
We design a novel text-guided image editing method, named as Forgedit. First, we propose a vision-language joint optimization framework capable of reconstructing the original image in 30 seconds. Then, we propose a novel vector projection mechanism in text embedding space of Diffusion Models.
arXiv Detail & Related papers (2023-09-19T12:05:26Z)
Eliminating Contextual Prior Bias for Semantic Image Editing via Dual-Cycle Diffusion [35.95513392917737]
A novel approach called Dual-Cycle Diffusion generates an unbiased mask to guide image editing. Our experiments demonstrate the effectiveness of the proposed method, as it significantly improves the D-CLIP score from 0.272 to 0.283.
arXiv Detail & Related papers (2023-02-05T14:30:22Z)
Editing Out-of-domain GAN Inversion via Differential Activations [56.62964029959131]
We propose a novel GAN prior based editing framework to tackle the out-of-domain inversion problem with a composition-decomposition paradigm. With the aid of the generated Diff-CAM mask, a coarse reconstruction can intuitively be composited by the paired original and edited images. In the decomposition phase, we further present a GAN prior based deghosting network for separating the final fine edited image from the coarse reconstruction.
arXiv Detail & Related papers (2022-07-17T10:34:58Z)
High-Fidelity GAN Inversion for Image Attribute Editing [61.966946442222735]
We present a novel high-fidelity generative adversarial network (GAN) inversion framework that enables attribute editing with image-specific details well-preserved. With a low bit-rate latent code, previous works have difficulties in preserving high-fidelity details in reconstructed and edited images. We propose a distortion consultation approach that employs a distortion map as a reference for high-fidelity reconstruction.
arXiv Detail & Related papers (2021-09-14T11:23:48Z)
From Continuity to Editability: Inverting GANs with Consecutive Images [37.16137384683823]
Existing GAN inversion methods are stuck in a paradox that the inverted codes can either achieve high-fidelity reconstruction, or retain the editing capability. In this paper, we resolve this paradox by introducing consecutive images into the inversion process. Our method provides the first support of video-based GAN inversion, and an interesting application of unsupervised semantic transfer from consecutive images.
arXiv Detail & Related papers (2021-07-29T08:19:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.