Unified Diffusion-Based Rigid and Non-Rigid Editing with Text and Image
Guidance
- URL: http://arxiv.org/abs/2401.02126v1
- Date: Thu, 4 Jan 2024 08:21:30 GMT
- Title: Unified Diffusion-Based Rigid and Non-Rigid Editing with Text and Image
Guidance
- Authors: Jiacheng Wang, Ping Liu, Wei Xu
- Abstract summary: We present a versatile image editing framework capable of executing both rigid and non-rigid edits.
We leverage a dual-path injection scheme to handle diverse editing scenarios.
We introduce an integrated self-attention mechanism for fusion of appearance and structural information.
- Score: 15.130419159003816
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing text-to-image editing methods tend to excel either in rigid or
non-rigid editing but encounter challenges when combining both, resulting in
misaligned outputs with the provided text prompts. In addition, integrating
reference images for control remains challenging. To address these issues, we
present a versatile image editing framework capable of executing both rigid and
non-rigid edits, guided by either textual prompts or reference images. We
leverage a dual-path injection scheme to handle diverse editing scenarios and
introduce an integrated self-attention mechanism for fusion of appearance and
structural information. To mitigate potential visual artifacts, we further
employ latent fusion techniques to adjust intermediate latents. Compared to
previous work, our approach represents a significant advance in achieving
precise and versatile image editing. Comprehensive experiments validate the
efficacy of our method, showcasing competitive or superior results in
text-based editing and appearance transfer tasks, encompassing both rigid and
non-rigid settings.
Related papers
- UIP2P: Unsupervised Instruction-based Image Editing via Cycle Edit Consistency [69.33072075580483]
We propose an unsupervised model for instruction-based image editing that eliminates the need for ground-truth edited images during training.
Our method addresses these challenges by introducing a novel editing mechanism called Cycle Edit Consistency ( CEC)
CEC applies forward and backward edits in one training step and enforces consistency in image and attention spaces.
arXiv Detail & Related papers (2024-12-19T18:59:58Z) - Prompt Augmentation for Self-supervised Text-guided Image Manipulation [34.01939157351624]
We introduce prompt augmentation, a method amplifying a single input prompt into several target prompts, strengthening textual context and enabling localised image editing.
We propose a Contrastive Loss tailored to driving effective image editing by displacing edited areas and drawing preserved regions closer.
New losses are incorporated to the diffusion model, demonstrating improved or competitive image editing results on public datasets and generated images over state-of-the-art approaches.
arXiv Detail & Related papers (2024-12-17T16:54:05Z) - Task-Oriented Diffusion Inversion for High-Fidelity Text-based Editing [60.730661748555214]
We introduce textbfTask-textbfOriented textbfDiffusion textbfInversion (textbfTODInv), a novel framework that inverts and edits real images tailored to specific editing tasks.
ToDInv seamlessly integrates inversion and editing through reciprocal optimization, ensuring both high fidelity and precise editability.
arXiv Detail & Related papers (2024-08-23T22:16:34Z) - E4C: Enhance Editability for Text-Based Image Editing by Harnessing Efficient CLIP Guidance [13.535394339438428]
Diffusion-based image editing is a composite process of preserving the source image content and generating new content or applying modifications.
We propose a zero-shot image editing method, named textbfEnhance textbfEditability for text-based image textbfEditing via textbfCLIP guidance.
arXiv Detail & Related papers (2024-03-15T09:26:48Z) - DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image
Editing [66.43179841884098]
Large-scale Text-to-Image (T2I) diffusion models have revolutionized image generation over the last few years.
We propose DiffEditor to rectify two weaknesses in existing diffusion-based image editing.
Our method can efficiently achieve state-of-the-art performance on various fine-grained image editing tasks.
arXiv Detail & Related papers (2024-02-04T18:50:29Z) - Optimisation-Based Multi-Modal Semantic Image Editing [58.496064583110694]
We propose an inference-time editing optimisation to accommodate multiple editing instruction types.
By allowing to adjust the influence of each loss function, we build a flexible editing solution that can be adjusted to user preferences.
We evaluate our method using text, pose and scribble edit conditions, and highlight our ability to achieve complex edits.
arXiv Detail & Related papers (2023-11-28T15:31:11Z) - LayerDiffusion: Layered Controlled Image Editing with Diffusion Models [5.58892860792971]
LayerDiffusion is a semantic-based layered controlled image editing method.
We leverage a large-scale text-to-image model and employ a layered controlled optimization strategy.
Experimental results demonstrate the effectiveness of our method in generating highly coherent images.
arXiv Detail & Related papers (2023-05-30T01:26:41Z) - StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing [115.49488548588305]
A significant research effort is focused on exploiting the amazing capacities of pretrained diffusion models for the editing of images.
They either finetune the model, or invert the image in the latent space of the pretrained model.
They suffer from two problems: Unsatisfying results for selected regions and unexpected changes in non-selected regions.
arXiv Detail & Related papers (2023-03-28T00:16:45Z) - Zero-shot Image-to-Image Translation [57.46189236379433]
We propose pix2pix-zero, an image-to-image translation method that can preserve the original image without manual prompting.
We propose cross-attention guidance, which aims to retain the cross-attention maps of the input image throughout the diffusion process.
Our method does not need additional training for these edits and can directly use the existing text-to-image diffusion model.
arXiv Detail & Related papers (2023-02-06T18:59:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.