Concept-centric Personalization with Large-scale Diffusion Priors
- URL: http://arxiv.org/abs/2312.08195v1
- Date: Wed, 13 Dec 2023 14:59:49 GMT
- Title: Concept-centric Personalization with Large-scale Diffusion Priors
- Authors: Pu Cao, Lu Yang, Feng Zhou, Tianrui Huang, Qing Song
- Abstract summary: We present the task of customizing large-scale diffusion priors for specific concepts as conceptcentric personalization.
Our goal is to generate high-quality concept-centric images while maintaining the versatile controllability inherent to openworld models.
- Score: 7.684688573874212
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite large-scale diffusion models being highly capable of generating
diverse open-world content, they still struggle to match the photorealism and
fidelity of concept-specific generators. In this work, we present the task of
customizing large-scale diffusion priors for specific concepts as
concept-centric personalization. Our goal is to generate high-quality
concept-centric images while maintaining the versatile controllability inherent
to open-world models, enabling applications in diverse tasks such as
concept-centric stylization and image translation. To tackle these challenges,
we identify catastrophic forgetting of guidance prediction from diffusion
priors as the fundamental issue. Consequently, we develop a guidance-decoupled
personalization framework specifically designed to address this task. We
propose Generalized Classifier-free Guidance (GCFG) as the foundational theory
for our framework. This approach extends Classifier-free Guidance (CFG) to
accommodate an arbitrary number of guidances, sourced from a variety of
conditions and models. Employing GCFG enables us to separate conditional
guidance into two distinct components: concept guidance for fidelity and
control guidance for controllability. This division makes it feasible to train
a specialized model for concept guidance, while ensuring both control and
unconditional guidance remain intact. We then present a null-text
Concept-centric Diffusion Model as a concept-specific generator to learn
concept guidance without the need for text annotations. Code will be available
at https://github.com/PRIV-Creation/Concept-centric-Personalization.
Related papers
- ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction [20.43411883845885]
We introduce a novel task named Unsupervised Concept Extraction (UCE) that considers an unsupervised setting without any human knowledge of the concepts.
Given an image that contains multiple concepts, the task aims to extract and recreate individual concepts solely relying on the existing knowledge from pretrained diffusion models.
We present ConceptExpress that tackles UCE by unleashing the inherent capabilities of pretrained diffusion models in two aspects.
arXiv Detail & Related papers (2024-07-09T17:50:28Z) - Non-confusing Generation of Customized Concepts in Diffusion Models [135.4385383284657]
We tackle the common challenge of inter-concept visual confusion in compositional concept generation using text-guided diffusion models (TGDMs)
Existing customized generation methods only focus on fine-tuning the second stage while overlooking the first one.
We propose a simple yet effective solution called CLIF: contrastive image-language fine-tuning.
arXiv Detail & Related papers (2024-05-11T05:01:53Z) - Infusion: Preventing Customized Text-to-Image Diffusion from Overfitting [51.606819347636076]
We analyze concept-agnostic overfitting, which undermines non-customized concept knowledge, and concept-specific overfitting, which is confined to customize on limited modalities.
We propose Infusion, a T2I customization method that enables the learning of target concepts to avoid being constrained by limited training modalities.
arXiv Detail & Related papers (2024-04-22T09:16:25Z) - MC$^2$: Multi-concept Guidance for Customized Multi-concept Generation [49.935634230341904]
We introduce the Multi-concept guidance for Multi-concept customization, termed MC$2$, for improved flexibility and fidelity.
MC$2$ decouples the requirements for model architecture via inference time optimization.
It adaptively refines the attention weights between visual and textual tokens, directing image regions to focus on their associated words.
arXiv Detail & Related papers (2024-04-08T07:59:04Z) - LoRA-Composer: Leveraging Low-Rank Adaptation for Multi-Concept Customization in Training-Free Diffusion Models [33.379758040084894]
Multi-concept customization emerges as the challenging task within this domain.
Existing approaches often rely on training a fusion matrix of multiple Low-Rank Adaptations (LoRAs) to merge various concepts into a single image.
LoRA-Composer is a training-free framework designed for seamlessly integrating multiple LoRAs.
arXiv Detail & Related papers (2024-03-18T09:58:52Z) - Coarse-to-Fine Concept Bottleneck Models [9.910980079138206]
This work targets ante hoc interpretability, and specifically Concept Bottleneck Models (CBMs)
Our goal is to design a framework that admits a highly interpretable decision making process with respect to human understandable concepts, on two levels of granularity.
Within this framework, concept information does not solely rely on the similarity between the whole image and general unstructured concepts; instead, we introduce the notion of concept hierarchy to uncover and exploit more granular concept information residing in patch-specific regions of the image scene.
arXiv Detail & Related papers (2023-10-03T14:57:31Z) - Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept
Customization of Diffusion Models [72.67967883658957]
Public large-scale text-to-image diffusion models can be easily customized for new concepts using low-rank adaptations (LoRAs)
The utilization of multiple concept LoRAs to jointly support multiple customized concepts presents a challenge.
We propose a new framework called Mix-of-Show that addresses the challenges of decentralized multi-concept customization.
arXiv Detail & Related papers (2023-05-29T17:58:16Z) - Designing an Encoder for Fast Personalization of Text-to-Image Models [57.62449900121022]
We propose an encoder-based domain-tuning approach for text-to-image personalization.
We employ two components: First, an encoder that takes as an input a single image of a target concept from a given domain.
Second, a set of regularized weight-offsets for the text-to-image model that learn how to effectively ingest additional concepts.
arXiv Detail & Related papers (2023-02-23T18:46:41Z) - Unsupervised Learning of Compositional Energy Concepts [70.11673173291426]
We propose COMET, which discovers and represents concepts as separate energy functions.
Comet represents both global concepts as well as objects under a unified framework.
arXiv Detail & Related papers (2021-11-04T17:46:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.