Related papers: Custom-Edit: Text-Guided Image Editing with Customized Diffusion Models

Custom-Edit: Text-Guided Image Editing with Customized Diffusion Models

URL: http://arxiv.org/abs/2305.15779v1
Date: Thu, 25 May 2023 06:46:28 GMT
Title: Custom-Edit: Text-Guided Image Editing with Customized Diffusion Models
Authors: Jooyoung Choi, Yunjey Choi, Yunji Kim, Junho Kim, Sungroh Yoon
Abstract summary: Text-to-image diffusion models can generate diverse, high-fidelity images based on user-provided text prompts. We propose Custom-Edit, in which we (i) customize a diffusion model with a few reference images and then (ii) perform text-guided editing.
Score: 26.92450293675906
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Text-to-image diffusion models can generate diverse, high-fidelity images based on user-provided text prompts. Recent research has extended these models to support text-guided image editing. While text guidance is an intuitive editing interface for users, it often fails to ensure the precise concept conveyed by users. To address this issue, we propose Custom-Edit, in which we (i) customize a diffusion model with a few reference images and then (ii) perform text-guided editing. Our key discovery is that customizing only language-relevant parameters with augmented prompts improves reference similarity significantly while maintaining source similarity. Moreover, we provide our recipe for each customization and editing process. We compare popular customization methods and validate our findings on two editing methods using various datasets.

Related papers

Instructing Text-to-Image Diffusion Models via Classifier-Guided Semantic Optimization [48.38187112651368]
We propose optimizing semantic embeddings guided by attribute classifiers to steer text-to-image models toward desired edits.<n>Our method achieves high levels of disentanglement and strong generalization across different domains of data.
arXiv Detail & Related papers (2025-05-20T12:07:01Z)
PartEdit: Fine-Grained Image Editing using Pre-Trained Diffusion Models [80.98455219375862]
We present the first text-based image editing approach for object parts based on pre-trained diffusion models. Our approach is preferred by users 77-90% of the time in conducted user studies.
arXiv Detail & Related papers (2025-02-06T13:08:43Z)
ReEdit: Multimodal Exemplar-Based Image Editing with Diffusion Models [11.830273909934688]
Modern Text-to-Image (T2I) Diffusion models have revolutionized image editing by enabling the generation of high-quality images. We propose ReEdit, a modular and efficient end-to-end framework that captures edits in both text and image modalities. Our results demonstrate that ReEdit consistently outperforms contemporary approaches both qualitatively and quantitatively.
arXiv Detail & Related papers (2024-11-06T15:19:24Z)
DreamSteerer: Enhancing Source Image Conditioned Editability using Personalized Diffusion Models [7.418186319496487]
Recent text-to-image personalization methods have shown great promise in teaching a diffusion model user-specified concepts. A promising extension is personalized editing, namely to edit an image using personalized concepts. We propose DreamSteerer, a plug-in method for augmenting existing T2I personalization methods.
arXiv Detail & Related papers (2024-10-15T02:50:54Z)
A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models [117.77807994397784]
Image editing aims to edit the given synthetic or real image to meet the specific requirements from users. Recent significant advancement in this field is based on the development of text-to-image (T2I) diffusion models. T2I-based image editing methods significantly enhance editing performance and offer a user-friendly interface for modifying content guided by multimodal inputs.
arXiv Detail & Related papers (2024-06-20T17:58:52Z)
Tuning-Free Image Customization with Image and Text Guidance [65.9504243633169]
We introduce a tuning-free framework for simultaneous text-image-guided image customization. Our approach preserves the semantic features of the reference image subject while allowing modification of detailed attributes based on text descriptions. Our approach outperforms previous methods in both human and quantitative evaluations.
arXiv Detail & Related papers (2024-03-19T11:48:35Z)
Localizing and Editing Knowledge in Text-to-Image Generative Models [62.02776252311559]
knowledge about different attributes is not localized in isolated components, but is instead distributed amongst a set of components in the conditional UNet. We introduce a fast, data-free model editing method Diff-QuickFix which can effectively edit concepts in text-to-image models.
arXiv Detail & Related papers (2023-10-20T17:31:12Z)
Prompt Tuning Inversion for Text-Driven Image Editing Using Diffusion Models [6.34777393532937]
We propose an accurate and quick inversion technique, Prompt Tuning Inversion, for text-driven image editing. Our proposed editing method consists of a reconstruction stage and an editing stage. Experiments on ImageNet demonstrate the superior editing performance of our method compared to the state-of-the-art baselines.
arXiv Detail & Related papers (2023-05-08T03:34:33Z)
Zero-shot Image-to-Image Translation [57.46189236379433]
We propose pix2pix-zero, an image-to-image translation method that can preserve the original image without manual prompting. We propose cross-attention guidance, which aims to retain the cross-attention maps of the input image throughout the diffusion process. Our method does not need additional training for these edits and can directly use the existing text-to-image diffusion model.
arXiv Detail & Related papers (2023-02-06T18:59:51Z)
Direct Inversion: Optimization-Free Text-Driven Real Image Editing with Diffusion Models [0.0]
We propose an optimization-free and zero fine-tuning framework that applies complex and non-rigid edits to a single real image via a text prompt. We prove our method's efficacy in producing high-quality, diverse, semantically coherent, and faithful real image edits.
arXiv Detail & Related papers (2022-11-15T01:07:38Z)
DiffEdit: Diffusion-based semantic image editing with mask guidance [64.555930158319]
DiffEdit is a method to take advantage of text-conditioned diffusion models for the task of semantic image editing. Our main contribution is able to automatically generate a mask highlighting regions of the input image that need to be edited.
arXiv Detail & Related papers (2022-10-20T17:16:37Z)
EditGAN: High-Precision Semantic Image Editing [120.49401527771067]
EditGAN is a novel method for high quality, high precision semantic image editing. We show that EditGAN can manipulate images with an unprecedented level of detail and freedom. We can also easily combine multiple edits and perform plausible edits beyond EditGAN training data.
arXiv Detail & Related papers (2021-11-04T22:36:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.