SINE: SINgle Image Editing with Text-to-Image Diffusion Models
- URL: http://arxiv.org/abs/2212.04489v1
- Date: Thu, 8 Dec 2022 18:57:13 GMT
- Title: SINE: SINgle Image Editing with Text-to-Image Diffusion Models
- Authors: Zhixing Zhang, Ligong Han, Arnab Ghosh, Dimitris Metaxas, Jian Ren
- Abstract summary: This work aims to address the problem of single-image editing.
We propose a novel model-based guidance built upon the classifier-free guidance.
We show promising editing capabilities, including changing style, content addition, and object manipulation.
- Score: 10.67527134198167
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent works on diffusion models have demonstrated a strong capability for
conditioning image generation, e.g., text-guided image synthesis. Such success
inspires many efforts trying to use large-scale pre-trained diffusion models
for tackling a challenging problem--real image editing. Works conducted in this
area learn a unique textual token corresponding to several images containing
the same object. However, under many circumstances, only one image is
available, such as the painting of the Girl with a Pearl Earring. Using
existing works on fine-tuning the pre-trained diffusion models with a single
image causes severe overfitting issues. The information leakage from the
pre-trained diffusion models makes editing can not keep the same content as the
given image while creating new features depicted by the language guidance. This
work aims to address the problem of single-image editing. We propose a novel
model-based guidance built upon the classifier-free guidance so that the
knowledge from the model trained on a single image can be distilled into the
pre-trained diffusion model, enabling content creation even with one given
image. Additionally, we propose a patch-based fine-tuning that can effectively
help the model generate images of arbitrary resolution. We provide extensive
experiments to validate the design choices of our approach and show promising
editing capabilities, including changing style, content addition, and object
manipulation. The code is available for research purposes at
https://github.com/zhang-zx/SINE.git .
Related papers
- DiffUHaul: A Training-Free Method for Object Dragging in Images [78.93531472479202]
We propose a training-free method, dubbed DiffUHaul, for the object dragging task.
We first apply attention masking in each denoising step to make the generation more disentangled across different objects.
In the early denoising steps, we interpolate the attention features between source and target images to smoothly fuse new layouts with the original appearance.
arXiv Detail & Related papers (2024-06-03T17:59:53Z) - DreamDistribution: Prompt Distribution Learning for Text-to-Image
Diffusion Models [53.17454737232668]
We introduce a solution that allows a pretrained T2I diffusion model to learn a set of soft prompts.
These prompts offer text-guided editing capabilities and additional flexibility in controlling variation and mixing between multiple distributions.
We also show the adaptability of the learned prompt distribution to other tasks, such as text-to-3D.
arXiv Detail & Related papers (2023-12-21T12:11:00Z) - Unified Concept Editing in Diffusion Models [53.30378722979958]
We present a method that tackles all issues with a single approach.
Our method, Unified Concept Editing (UCE), edits the model without training using a closed-form solution.
We demonstrate scalable simultaneous debiasing, style erasure, and content moderation by editing text-to-image projections.
arXiv Detail & Related papers (2023-08-25T17:59:59Z) - DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models [66.43179841884098]
We propose a novel image editing method, DragonDiffusion, enabling Drag-style manipulation on Diffusion models.
Our method achieves various editing modes for the generated or real images, such as object moving, object resizing, object appearance replacement, and content dragging.
arXiv Detail & Related papers (2023-07-05T16:43:56Z) - DiffUTE: Universal Text Editing Diffusion Model [32.384236053455]
We propose a universal self-supervised text editing diffusion model (DiffUTE)
It aims to replace or modify words in the source image with another one while maintaining its realistic appearance.
Our method achieves an impressive performance and enables controllable editing on in-the-wild images with high fidelity.
arXiv Detail & Related papers (2023-05-18T09:06:01Z) - ReGeneration Learning of Diffusion Models with Rich Prompts for
Zero-Shot Image Translation [8.803251014279502]
Large-scale text-to-image models have demonstrated amazing ability to synthesize diverse and high-fidelity images.
Current models can impose significant changes to the original image content during the editing process.
We propose ReGeneration learning in an image-to-image Diffusion model (ReDiffuser)
arXiv Detail & Related papers (2023-05-08T12:08:12Z) - Pix2Video: Video Editing using Image Diffusion [43.07444438561277]
We investigate how to use pre-trained image models for text-guided video editing.
Our method works in two simple steps: first, we use a pre-trained structure-guided (e.g., depth) image diffusion model to perform text-guided edits on an anchor frame.
We demonstrate that realistic text-guided video edits are possible, without any compute-intensive preprocessing or video-specific finetuning.
arXiv Detail & Related papers (2023-03-22T16:36:10Z) - Zero-shot Image-to-Image Translation [57.46189236379433]
We propose pix2pix-zero, an image-to-image translation method that can preserve the original image without manual prompting.
We propose cross-attention guidance, which aims to retain the cross-attention maps of the input image throughout the diffusion process.
Our method does not need additional training for these edits and can directly use the existing text-to-image diffusion model.
arXiv Detail & Related papers (2023-02-06T18:59:51Z) - Uncovering the Disentanglement Capability in Text-to-Image Diffusion
Models [60.63556257324894]
A key desired property of image generative models is the ability to disentangle different attributes.
We propose a simple, light-weight image editing algorithm where the mixing weights of the two text embeddings are optimized for style matching and content preservation.
Experiments show that the proposed method can modify a wide range of attributes, with the performance outperforming diffusion-model-based image-editing algorithms.
arXiv Detail & Related papers (2022-12-16T19:58:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.