Related papers: Prompt Sliders for Fine-Grained Control, Editing and Erasing of Concepts in Diffusion Models

Prompt Sliders for Fine-Grained Control, Editing and Erasing of Concepts in Diffusion Models

URL: http://arxiv.org/abs/2409.16535v1
Date: Wed, 25 Sep 2024 01:02:30 GMT
Title: Prompt Sliders for Fine-Grained Control, Editing and Erasing of Concepts in Diffusion Models
Authors: Deepak Sridhar, Nuno Vasconcelos,
Abstract summary: Concept Sliders introduced a method for fine-grained image control and editing by learning concepts (attributes/objects) This approach adds parameters and increases inference time due to the loading and unloading of Low-Rank Adapters (LoRAs) used for learning concepts. We propose a straightforward textual inversion method to learn concepts through text embeddings, which are generalizable across models that share the same text encoder.
Score: 53.385754347812835
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Diffusion models have recently surpassed GANs in image synthesis and editing, offering superior image quality and diversity. However, achieving precise control over attributes in generated images remains a challenge. Concept Sliders introduced a method for fine-grained image control and editing by learning concepts (attributes/objects). However, this approach adds parameters and increases inference time due to the loading and unloading of Low-Rank Adapters (LoRAs) used for learning concepts. These adapters are model-specific and require retraining for different architectures, such as Stable Diffusion (SD) v1.5 and SD-XL. In this paper, we propose a straightforward textual inversion method to learn concepts through text embeddings, which are generalizable across models that share the same text encoder, including different versions of the SD model. We refer to our method as Prompt Sliders. Besides learning new concepts, we also show that Prompt Sliders can be used to erase undesirable concepts such as artistic styles or mature content. Our method is 30% faster than using LoRAs because it eliminates the need to load and unload adapters and introduces no additional parameters aside from the target concept text embedding. Each concept embedding only requires 3KB of storage compared to the 8922KB or more required for each LoRA adapter, making our approach more computationally efficient. Project Page: https://deepaksridhar.github.io/promptsliders.github.io/

Related papers

Conceptrol: Concept Control of Zero-shot Personalized Image Generation [36.39574513193442]
Conceptrol is a framework that enhances zero-shot adapters without adding computational overhead. It achieves as much as 89% improvement on personalization benchmarks over the vanilla IP-Adapter.
arXiv Detail & Related papers (2025-03-09T11:54:08Z)
Concept Steerers: Leveraging K-Sparse Autoencoders for Controllable Generations [10.86252546314626]
Text-to-image generative models are prone to adversarial attacks and inadvertently generate unsafe, unethical content. We propose a novel framework leveraging k-sparse autoencoders (k-SAEs) to enable efficient and interpretable concept manipulation. Our method yields an improvement of $mathbf20.01%$ in unsafe concept removal, is effective in style manipulation, and is $mathbfsim5$x faster than current state-of-the-art.
arXiv Detail & Related papers (2025-01-31T11:52:47Z)
LoRA of Change: Learning to Generate LoRA for the Editing Instruction from A Single Before-After Image Pair [116.48684498656871]
We propose the LoRA of Change (LoC) framework for image editing with visual instructions, i.e., before-after image pairs. We learn an instruction-specific LoRA to encode the "change" in a before-after image pair, enhancing the interpretability and reusability of our model. Our model produces high-quality images that align with user intent and support a broad spectrum of real-world visual instructions.
arXiv Detail & Related papers (2024-11-28T13:55:06Z)
Erasing Concepts from Text-to-Image Diffusion Models with Few-shot Unlearning [0.0]
We propose a novel concept-erasure method that updates the text encoder using few-shot unlearning. Our method can erase a concept within 10 s, making concept erasure more accessible than ever before.
arXiv Detail & Related papers (2024-05-12T14:01:05Z)
Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models [52.894213114914805]
We present a method to create interpretable concept sliders that enable precise control over attributes in image generations from diffusion models. A slider is created using a small set of prompts or sample images. Our method can help address persistent quality issues in Stable XL Diffusion including repair of object deformations and fixing distorted hands.
arXiv Detail & Related papers (2023-11-20T18:59:01Z)
BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing [73.74570290836152]
BLIP-Diffusion is a new subject-driven image generation model that supports multimodal control. Unlike other subject-driven generation models, BLIP-Diffusion introduces a new multimodal encoder which is pre-trained to provide subject representation.
arXiv Detail & Related papers (2023-05-24T04:51:04Z)
LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models [62.75006608940132]
This work proposes to enhance prompt understanding capabilities in text-to-image diffusion models. Our method leverages a pretrained large language model for grounded generation in a novel two-stage process. Our method significantly outperforms the base diffusion model and several strong baselines in accurately generating images.
arXiv Detail & Related papers (2023-05-23T03:59:06Z)
Continual Diffusion: Continual Customization of Text-to-Image Diffusion with C-LoRA [64.10981296843609]
We show that recent state-of-the-art customization of text-to-image models suffer from catastrophic forgetting when new concepts arrive sequentially. We propose a new method, C-LoRA, composed of a continually self-regularized low-rank adaptation in cross attention layers of the popular Stable Diffusion model. We show that C-LoRA not only outperforms several baselines for our proposed setting of text-to-image continual customization, but that we achieve a new state-of-the-art in the well-established rehearsal-free continual learning setting for image classification.
arXiv Detail & Related papers (2023-04-12T17:59:41Z)
Designing an Encoder for Fast Personalization of Text-to-Image Models [57.62449900121022]
We propose an encoder-based domain-tuning approach for text-to-image personalization. We employ two components: First, an encoder that takes as an input a single image of a target concept from a given domain. Second, a set of regularized weight-offsets for the text-to-image model that learn how to effectively ingest additional concepts.
arXiv Detail & Related papers (2023-02-23T18:46:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.