LDEdit: Towards Generalized Text Guided Image Manipulation via Latent
Diffusion Models
- URL: http://arxiv.org/abs/2210.02249v1
- Date: Wed, 5 Oct 2022 13:26:15 GMT
- Title: LDEdit: Towards Generalized Text Guided Image Manipulation via Latent
Diffusion Models
- Authors: Paramanand Chandramouli, Kanchana Vaishnavi Gandikota
- Abstract summary: generic image manipulation using a single model with flexible text inputs is highly desirable.
Recent work addresses this task by guiding generative models trained on the generic image using pretrained vision-language encoders.
We propose an optimization-free method for the task of generic image manipulation from text prompts.
- Score: 12.06277444740134
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Research in vision-language models has seen rapid developments off-late,
enabling natural language-based interfaces for image generation and
manipulation. Many existing text guided manipulation techniques are restricted
to specific classes of images, and often require fine-tuning to transfer to a
different style or domain. Nevertheless, generic image manipulation using a
single model with flexible text inputs is highly desirable. Recent work
addresses this task by guiding generative models trained on the generic image
datasets using pretrained vision-language encoders. While promising, this
approach requires expensive optimization for each input. In this work, we
propose an optimization-free method for the task of generic image manipulation
from text prompts. Our approach exploits recent Latent Diffusion Models (LDM)
for text to image generation to achieve zero-shot text guided manipulation. We
employ a deterministic forward diffusion in a lower dimensional latent space,
and the desired manipulation is achieved by simply providing the target text to
condition the reverse diffusion process. We refer to our approach as LDEdit. We
demonstrate the applicability of our method on semantic image manipulation and
artistic style transfer. Our method can accomplish image manipulation on
diverse domains and enables editing multiple attributes in a straightforward
fashion. Extensive experiments demonstrate the benefit of our approach over
competing baselines.
Related papers
- De-Diffusion Makes Text a Strong Cross-Modal Interface [33.90004746543745]
We employ an autoencoder that uses a pre-trained text-to-image diffusion model for decoding.
Experiments validate the precision and comprehensiveness of De-Diffusion text representing images.
A single De-Diffusion model can generalize to provide transferable prompts for different text-to-image tools.
arXiv Detail & Related papers (2023-11-01T16:12:40Z) - Generating Images with Multimodal Language Models [78.6660334861137]
We propose a method to fuse frozen text-only large language models with pre-trained image encoder and decoder models.
Our model demonstrates a wide suite of multimodal capabilities: image retrieval, novel image generation, and multimodal dialogue.
arXiv Detail & Related papers (2023-05-26T19:22:03Z) - Towards Real-time Text-driven Image Manipulation with Unconditional
Diffusion Models [33.993466872389085]
We develop a novel algorithm that learns image manipulations 4.5-10 times faster and applies them 8 times faster.
Our approach can adapt the pretrained model to the user-specified image and text description on the fly just for 4 seconds.
arXiv Detail & Related papers (2023-04-10T01:21:56Z) - Direct Inversion: Optimization-Free Text-Driven Real Image Editing with
Diffusion Models [0.0]
We propose an optimization-free and zero fine-tuning framework that applies complex and non-rigid edits to a single real image via a text prompt.
We prove our method's efficacy in producing high-quality, diverse, semantically coherent, and faithful real image edits.
arXiv Detail & Related papers (2022-11-15T01:07:38Z) - eDiffi: Text-to-Image Diffusion Models with an Ensemble of Expert
Denoisers [87.52504764677226]
Large-scale diffusion-based generative models have led to breakthroughs in text-conditioned high-resolution image synthesis.
We train an ensemble of text-to-image diffusion models specialized for different stages synthesis.
Our ensemble of diffusion models, called eDiffi, results in improved text alignment while maintaining the same inference cost.
arXiv Detail & Related papers (2022-11-02T17:43:04Z) - FlexIT: Towards Flexible Semantic Image Translation [59.09398209706869]
We propose FlexIT, a novel method which can take any input image and a user-defined text instruction for editing.
First, FlexIT combines the input image and text into a single target point in the CLIP multimodal embedding space.
We iteratively transform the input image toward the target point, ensuring coherence and quality with a variety of novel regularization terms.
arXiv Detail & Related papers (2022-03-09T13:34:38Z) - StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators [63.85888518950824]
We present a text-driven method that allows shifting a generative model to new domains.
We show that through natural language prompts and a few minutes of training, our method can adapt a generator across a multitude of domains.
arXiv Detail & Related papers (2021-08-02T14:46:46Z) - Towards Open-World Text-Guided Face Image Generation and Manipulation [52.83401421019309]
We propose a unified framework for both face image generation and manipulation.
Our method supports open-world scenarios, including both image and text, without any re-training, fine-tuning, or post-processing.
arXiv Detail & Related papers (2021-04-18T16:56:07Z) - StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery [71.1862388442953]
We develop a text-based interface for StyleGAN image manipulation.
We first introduce an optimization scheme that utilizes a CLIP-based loss to modify an input latent vector in response to a user-provided text prompt.
Next, we describe a latent mapper that infers a text-guided latent manipulation step for a given input image, allowing faster and more stable text-based manipulation.
arXiv Detail & Related papers (2021-03-31T17:51:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.