The Stable Artist: Steering Semantics in Diffusion Latent Space
- URL: http://arxiv.org/abs/2212.06013v3
- Date: Wed, 31 May 2023 15:17:54 GMT
- Title: The Stable Artist: Steering Semantics in Diffusion Latent Space
- Authors: Manuel Brack, Patrick Schramowski, Felix Friedrich, Dominik
Hintersdorf, Kristian Kersting
- Abstract summary: We present the Stable Artist, an image editing approach enabling fine-grained control of the image generation process.
The main component is semantic guidance (SEGA) which steers the diffusion process along variable numbers of semantic directions.
SEGA enables probing of latent spaces to gain insights into the representation of concepts learned by the model.
- Score: 17.119616029527744
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large, text-conditioned generative diffusion models have recently gained a
lot of attention for their impressive performance in generating high-fidelity
images from text alone. However, achieving high-quality results is almost
unfeasible in a one-shot fashion. On the contrary, text-guided image generation
involves the user making many slight changes to inputs in order to iteratively
carve out the envisioned image. However, slight changes to the input prompt
often lead to entirely different images being generated, and thus the control
of the artist is limited in its granularity. To provide flexibility, we present
the Stable Artist, an image editing approach enabling fine-grained control of
the image generation process. The main component is semantic guidance (SEGA)
which steers the diffusion process along variable numbers of semantic
directions. This allows for subtle edits to images, changes in composition and
style, as well as optimization of the overall artistic conception. Furthermore,
SEGA enables probing of latent spaces to gain insights into the representation
of concepts learned by the model, even complex ones such as 'carbon emission'.
We demonstrate the Stable Artist on several tasks, showcasing high-quality
image editing and composition.
Related papers
- Editable Image Elements for Controllable Synthesis [79.58148778509769]
We propose an image representation that promotes spatial editing of input images using a diffusion model.
We show the effectiveness of our representation on various image editing tasks, such as object resizing, rearrangement, dragging, de-occlusion, removal, variation, and image composition.
arXiv Detail & Related papers (2024-04-24T17:59:11Z) - TextCraftor: Your Text Encoder Can be Image Quality Controller [65.27457900325462]
Diffusion-based text-to-image generative models, e.g., Stable Diffusion, have revolutionized the field of content generation.
We propose a proposed fine-tuning approach, TextCraftor, to enhance the performance of text-to-image diffusion models.
arXiv Detail & Related papers (2024-03-27T19:52:55Z) - DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image
Editing [66.43179841884098]
Large-scale Text-to-Image (T2I) diffusion models have revolutionized image generation over the last few years.
We propose DiffEditor to rectify two weaknesses in existing diffusion-based image editing.
Our method can efficiently achieve state-of-the-art performance on various fine-grained image editing tasks.
arXiv Detail & Related papers (2024-02-04T18:50:29Z) - Highly Personalized Text Embedding for Image Manipulation by Stable
Diffusion [34.662798793560995]
We present a simple yet highly effective approach to personalization using highly personalized (PerHi) text embedding.
Our method does not require model fine-tuning or identifiers, yet still enables manipulation of background, texture, and motion with just a single image and target text.
arXiv Detail & Related papers (2023-03-15T17:07:45Z) - Zero-shot Image-to-Image Translation [57.46189236379433]
We propose pix2pix-zero, an image-to-image translation method that can preserve the original image without manual prompting.
We propose cross-attention guidance, which aims to retain the cross-attention maps of the input image throughout the diffusion process.
Our method does not need additional training for these edits and can directly use the existing text-to-image diffusion model.
arXiv Detail & Related papers (2023-02-06T18:59:51Z) - SEGA: Instructing Text-to-Image Models using Semantic Guidance [33.080261792998826]
We show how to interact with the diffusion process to flexibly steer it along semantic directions.
SEGA generalizes to any generative architecture using classifier-free guidance.
It allows for subtle and extensive edits, changes in composition and style, as well as optimizing the overall artistic conception.
arXiv Detail & Related papers (2023-01-28T16:43:07Z) - Plug-and-Play Diffusion Features for Text-Driven Image-to-Image
Translation [10.39028769374367]
We present a new framework that takes text-to-image synthesis to the realm of image-to-image translation.
Our method harnesses the power of a pre-trained text-to-image diffusion model to generate a new image that complies with the target text.
arXiv Detail & Related papers (2022-11-22T20:39:18Z) - DiffEdit: Diffusion-based semantic image editing with mask guidance [64.555930158319]
DiffEdit is a method to take advantage of text-conditioned diffusion models for the task of semantic image editing.
Our main contribution is able to automatically generate a mask highlighting regions of the input image that need to be edited.
arXiv Detail & Related papers (2022-10-20T17:16:37Z) - FlexIT: Towards Flexible Semantic Image Translation [59.09398209706869]
We propose FlexIT, a novel method which can take any input image and a user-defined text instruction for editing.
First, FlexIT combines the input image and text into a single target point in the CLIP multimodal embedding space.
We iteratively transform the input image toward the target point, ensuring coherence and quality with a variety of novel regularization terms.
arXiv Detail & Related papers (2022-03-09T13:34:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.