Manipulating Embeddings of Stable Diffusion Prompts
- URL: http://arxiv.org/abs/2308.12059v2
- Date: Sat, 22 Jun 2024 16:58:19 GMT
- Title: Manipulating Embeddings of Stable Diffusion Prompts
- Authors: Niklas Deckers, Julia Peters, Martin Potthast,
- Abstract summary: We propose and analyze a new method to manipulate the embedding of a prompt instead of the prompt text.
Our methods are considered less tedious and that the resulting images are often preferred.
- Score: 22.10069408287608
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Prompt engineering is still the primary way for users of generative text-to-image models to manipulate generated images in a targeted way. Based on treating the model as a continuous function and by passing gradients between the image space and the prompt embedding space, we propose and analyze a new method to directly manipulate the embedding of a prompt instead of the prompt text. We then derive three practical interaction tools to support users with image generation: (1) Optimization of a metric defined in the image space that measures, for example, the image style. (2) Supporting a user in creative tasks by allowing them to navigate in the image space along a selection of directions of "near" prompt embeddings. (3) Changing the embedding of the prompt to include information that a user has seen in a particular seed but has difficulty describing in the prompt. Compared to prompt engineering, user-driven prompt embedding manipulation enables a more fine-grained, targeted control that integrates a user's intentions. Our user study shows that our methods are considered less tedious and that the resulting images are often preferred.
Related papers
- Prompting Hard or Hardly Prompting: Prompt Inversion for Text-to-Image
Diffusion Models [46.18013380882767]
This work focuses on inverting the diffusion model to obtain interpretable language prompts directly.
We leverage the findings that different timesteps of the diffusion process cater to different levels of detail in an image.
We show that our approach can identify semantically interpretable and meaningful prompts for a target image.
arXiv Detail & Related papers (2023-12-19T18:47:30Z) - User-friendly Image Editing with Minimal Text Input: Leveraging
Captioning and Injection Techniques [32.82206298102458]
Text-driven image editing has shown remarkable success in diffusion models.
The existing methods assume that the user's description sufficiently grounds the contexts in the source image.
We propose simple yet effective methods by combining prompt generation frameworks.
arXiv Detail & Related papers (2023-06-05T09:09:10Z) - Promptify: Text-to-Image Generation through Interactive Prompt
Exploration with Large Language Models [29.057923932305123]
We present Promptify, an interactive system that supports prompt exploration and refinement for text-to-image generative models.
Our user study shows that Promptify effectively facilitates the text-to-image workflow and outperforms an existing baseline tool widely used for text-to-image generation.
arXiv Detail & Related papers (2023-04-18T22:59:11Z) - Zero-shot Image-to-Image Translation [57.46189236379433]
We propose pix2pix-zero, an image-to-image translation method that can preserve the original image without manual prompting.
We propose cross-attention guidance, which aims to retain the cross-attention maps of the input image throughout the diffusion process.
Our method does not need additional training for these edits and can directly use the existing text-to-image diffusion model.
arXiv Detail & Related papers (2023-02-06T18:59:51Z) - Optimizing Prompts for Text-to-Image Generation [97.61295501273288]
Well-designed prompts can guide text-to-image models to generate amazing images.
But the performant prompts are often model-specific and misaligned with user input.
We propose prompt adaptation, a framework that automatically adapts original user input to model-preferred prompts.
arXiv Detail & Related papers (2022-12-19T16:50:41Z) - Interactive Image Manipulation with Complex Text Instructions [14.329411711887115]
We propose a novel image manipulation method that interactively edits an image using complex text instructions.
It allows users to not only improve the accuracy of image manipulation but also achieve complex tasks such as enlarging, dwindling, or removing objects.
Extensive experiments on the Caltech-UCSD Birds-200-2011 (CUB) dataset and Microsoft Common Objects in Context (MS COCO) datasets demonstrate our proposed method can enable interactive, flexible, and accurate image manipulation in real-time.
arXiv Detail & Related papers (2022-11-25T08:05:52Z) - StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery [71.1862388442953]
We develop a text-based interface for StyleGAN image manipulation.
We first introduce an optimization scheme that utilizes a CLIP-based loss to modify an input latent vector in response to a user-provided text prompt.
Next, we describe a latent mapper that infers a text-guided latent manipulation step for a given input image, allowing faster and more stable text-based manipulation.
arXiv Detail & Related papers (2021-03-31T17:51:25Z) - Text as Neural Operator: Image Manipulation by Text Instruction [68.53181621741632]
In this paper, we study a setting that allows users to edit an image with multiple objects using complex text instructions to add, remove, or change the objects.
The inputs of the task are multimodal including (1) a reference image and (2) an instruction in natural language that describes desired modifications to the image.
We show that the proposed model performs favorably against recent strong baselines on three public datasets.
arXiv Detail & Related papers (2020-08-11T07:07:10Z) - Open-Edit: Open-Domain Image Manipulation with Open-Vocabulary
Instructions [66.82547612097194]
We propose a novel algorithm, named Open-Edit, which is the first attempt on open-domain image manipulation with open-vocabulary instructions.
Our approach takes advantage of the unified visual-semantic embedding space pretrained on a general image-caption dataset.
We show promising results in manipulating open-vocabulary color, texture, and high-level attributes for various scenarios of open-domain images.
arXiv Detail & Related papers (2020-08-04T14:15:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.