Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models
- URL: http://arxiv.org/abs/2311.12092v2
- Date: Mon, 27 Nov 2023 08:29:54 GMT
- Title: Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models
- Authors: Rohit Gandikota, Joanna Materzynska, Tingrui Zhou, Antonio Torralba,
David Bau
- Abstract summary: We present a method to create interpretable concept sliders that enable precise control over attributes in image generations from diffusion models.
A slider is created using a small set of prompts or sample images.
Our method can help address persistent quality issues in Stable XL Diffusion including repair of object deformations and fixing distorted hands.
- Score: 52.894213114914805
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a method to create interpretable concept sliders that enable
precise control over attributes in image generations from diffusion models. Our
approach identifies a low-rank parameter direction corresponding to one concept
while minimizing interference with other attributes. A slider is created using
a small set of prompts or sample images; thus slider directions can be created
for either textual or visual concepts. Concept Sliders are plug-and-play: they
can be composed efficiently and continuously modulated, enabling precise
control over image generation. In quantitative experiments comparing to
previous editing techniques, our sliders exhibit stronger targeted edits with
lower interference. We showcase sliders for weather, age, styles, and
expressions, as well as slider compositions. We show how sliders can transfer
latents from StyleGAN for intuitive editing of visual concepts for which
textual description is difficult. We also find that our method can help address
persistent quality issues in Stable Diffusion XL including repair of object
deformations and fixing distorted hands. Our code, data, and trained sliders
are available at https://sliders.baulab.info/
Related papers
- SliderSpace: Decomposing the Visual Capabilities of Diffusion Models [50.82362500995365]
SliderSpace is a framework for automatically decomposing the visual capabilities of diffusion models.
It discovers multiple interpretable and diverse directions simultaneously from a single text prompt.
Our method produces more diverse and useful variations compared to baselines.
arXiv Detail & Related papers (2025-02-03T18:59:55Z) - Prompt Sliders for Fine-Grained Control, Editing and Erasing of Concepts in Diffusion Models [53.385754347812835]
Concept Sliders introduced a method for fine-grained image control and editing by learning concepts (attributes/objects)
This approach adds parameters and increases inference time due to the loading and unloading of Low-Rank Adapters (LoRAs) used for learning concepts.
We propose a straightforward textual inversion method to learn concepts through text embeddings, which are generalizable across models that share the same text encoder.
arXiv Detail & Related papers (2024-09-25T01:02:30Z) - Spatial Steerability of GANs via Self-Supervision from Discriminator [123.27117057804732]
We propose a self-supervised approach to improve the spatial steerability of GANs without searching for steerable directions in the latent space.
Specifically, we design randomly sampled Gaussian heatmaps to be encoded into the intermediate layers of generative models as spatial inductive bias.
During inference, users can interact with the spatial heatmaps in an intuitive manner, enabling them to edit the output image by adjusting the scene layout, moving, or removing objects.
arXiv Detail & Related papers (2023-01-20T07:36:29Z) - Towards Counterfactual Image Manipulation via CLIP [106.94502632502194]
Existing methods can achieve realistic editing of different visual attributes such as age and gender of facial images.
We investigate this problem in a text-driven manner with Contrastive-Language-Image-Pretraining (CLIP)
We design a novel contrastive loss that exploits predefined CLIP-space directions to guide the editing toward desired directions from different perspectives.
arXiv Detail & Related papers (2022-07-06T17:02:25Z) - GANSlider: How Users Control Generative Models for Images using Multiple
Sliders with and without Feedforward Information [33.28541180149195]
We investigate how multiple sliders with and without feedforward visualizations influence users' control of generative models.
We found that more control dimensions (sliders) significantly increase task difficulty and user actions.
Visualization alone are not always sufficient for users to understand individual control dimensions.
arXiv Detail & Related papers (2022-02-02T11:25:07Z) - PIE: Portrait Image Embedding for Semantic Control [82.69061225574774]
We present the first approach for embedding real portrait images in the latent space of StyleGAN.
We use StyleRig, a pretrained neural network that maps the control space of a 3D morphable face model to the latent space of the GAN.
An identity energy preservation term allows spatially coherent edits while maintaining facial integrity.
arXiv Detail & Related papers (2020-09-20T17:53:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.