Text-to-image Editing by Image Information Removal
- URL: http://arxiv.org/abs/2305.17489v2
- Date: Tue, 7 Nov 2023 19:22:36 GMT
- Title: Text-to-image Editing by Image Information Removal
- Authors: Zhongping Zhang, Jian Zheng, Jacob Zhiyuan Fang, Bryan A. Plummer
- Abstract summary: We propose a text-to-image editing model with an Image Information Removal module (IIR) that selectively erases color-related and texture-related information from the original image.
Our experiments on CUB, Outdoor Scenes, and COCO shows that our edited images are preferred 35% more often than prior work.
- Score: 19.464349486031566
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion models have demonstrated impressive performance in text-guided
image generation. Current methods that leverage the knowledge of these models
for image editing either fine-tune them using the input image (e.g., Imagic) or
incorporate structure information as additional constraints (e.g., ControlNet).
However, fine-tuning large-scale diffusion models on a single image can lead to
severe overfitting issues and lengthy inference time. Information leakage from
pretrained models also make it challenging to preserve image content not
related to the text input. Additionally, methods that incorporate structural
guidance (e.g., edge maps, semantic maps, keypoints) find retaining attributes
like colors and textures difficult. Using the input image as a control could
mitigate these issues, but since these models are trained via reconstruction, a
model can simply hide information about the original image when encoding it to
perfectly reconstruct the image without learning the editing task. To address
these challenges, we propose a text-to-image editing model with an Image
Information Removal module (IIR) that selectively erases color-related and
texture-related information from the original image, allowing us to better
preserve the text-irrelevant content and avoid issues arising from information
hiding. Our experiments on CUB, Outdoor Scenes, and COCO reports our approach
achieves the best editability-fidelity trade-off results. In addition, a user
study on COCO shows that our edited images are preferred 35% more often than
prior work.
Related papers
- DM-Align: Leveraging the Power of Natural Language Instructions to Make Changes to Images [55.546024767130994]
We propose a novel model to enhance the text-based control of an image editor by explicitly reasoning about which parts of the image to alter or preserve.
It relies on word alignments between a description of the original source image and the instruction that reflects the needed updates, and the input image.
It is evaluated on a subset of the Bison dataset and a self-defined dataset dubbed Dream.
arXiv Detail & Related papers (2024-04-27T22:45:47Z) - Localizing and Editing Knowledge in Text-to-Image Generative Models [62.02776252311559]
knowledge about different attributes is not localized in isolated components, but is instead distributed amongst a set of components in the conditional UNet.
We introduce a fast, data-free model editing method Diff-QuickFix which can effectively edit concepts in text-to-image models.
arXiv Detail & Related papers (2023-10-20T17:31:12Z) - Forgedit: Text Guided Image Editing via Learning and Forgetting [17.26772361532044]
We design a novel text-guided image editing method, named as Forgedit.
First, we propose a vision-language joint optimization framework capable of reconstructing the original image in 30 seconds.
Then, we propose a novel vector projection mechanism in text embedding space of Diffusion Models.
arXiv Detail & Related papers (2023-09-19T12:05:26Z) - iEdit: Localised Text-guided Image Editing with Weak Supervision [53.082196061014734]
We propose a novel learning method for text-guided image editing.
It generates images conditioned on a source image and a textual edit prompt.
It shows favourable results against its counterparts in terms of image fidelity, CLIP alignment score and qualitatively for editing both generated and real images.
arXiv Detail & Related papers (2023-05-10T07:39:14Z) - ReGeneration Learning of Diffusion Models with Rich Prompts for
Zero-Shot Image Translation [8.803251014279502]
Large-scale text-to-image models have demonstrated amazing ability to synthesize diverse and high-fidelity images.
Current models can impose significant changes to the original image content during the editing process.
We propose ReGeneration learning in an image-to-image Diffusion model (ReDiffuser)
arXiv Detail & Related papers (2023-05-08T12:08:12Z) - Zero-shot Image-to-Image Translation [57.46189236379433]
We propose pix2pix-zero, an image-to-image translation method that can preserve the original image without manual prompting.
We propose cross-attention guidance, which aims to retain the cross-attention maps of the input image throughout the diffusion process.
Our method does not need additional training for these edits and can directly use the existing text-to-image diffusion model.
arXiv Detail & Related papers (2023-02-06T18:59:51Z) - SINE: SINgle Image Editing with Text-to-Image Diffusion Models [10.67527134198167]
This work aims to address the problem of single-image editing.
We propose a novel model-based guidance built upon the classifier-free guidance.
We show promising editing capabilities, including changing style, content addition, and object manipulation.
arXiv Detail & Related papers (2022-12-08T18:57:13Z) - ClipCrop: Conditioned Cropping Driven by Vision-Language Model [90.95403416150724]
We take advantage of vision-language models as a foundation for creating robust and user-intentional cropping algorithms.
We develop a method to perform cropping with a text or image query that reflects the user's intention as guidance.
Our pipeline design allows the model to learn text-conditioned aesthetic cropping with a small dataset.
arXiv Detail & Related papers (2022-11-21T14:27:07Z) - COCO-FUNIT: Few-Shot Unsupervised Image Translation with a Content
Conditioned Style Encoder [70.23358875904891]
Unsupervised image-to-image translation aims to learn a mapping of an image in a given domain to an analogous image in a different domain.
We propose a new few-shot image translation model, COCO-FUNIT, which computes the style embedding of the example images conditioned on the input image.
Our model shows effectiveness in addressing the content loss problem.
arXiv Detail & Related papers (2020-07-15T02:01:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.