Concept-centric Personalization with Large-scale Diffusion Priors
- URL: http://arxiv.org/abs/2312.08195v1
- Date: Wed, 13 Dec 2023 14:59:49 GMT
- Title: Concept-centric Personalization with Large-scale Diffusion Priors
- Authors: Pu Cao, Lu Yang, Feng Zhou, Tianrui Huang, Qing Song
- Abstract summary: We present the task of customizing large-scale diffusion priors for specific concepts as conceptcentric personalization.
Our goal is to generate high-quality concept-centric images while maintaining the versatile controllability inherent to openworld models.
- Score: 7.684688573874212
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite large-scale diffusion models being highly capable of generating
diverse open-world content, they still struggle to match the photorealism and
fidelity of concept-specific generators. In this work, we present the task of
customizing large-scale diffusion priors for specific concepts as
concept-centric personalization. Our goal is to generate high-quality
concept-centric images while maintaining the versatile controllability inherent
to open-world models, enabling applications in diverse tasks such as
concept-centric stylization and image translation. To tackle these challenges,
we identify catastrophic forgetting of guidance prediction from diffusion
priors as the fundamental issue. Consequently, we develop a guidance-decoupled
personalization framework specifically designed to address this task. We
propose Generalized Classifier-free Guidance (GCFG) as the foundational theory
for our framework. This approach extends Classifier-free Guidance (CFG) to
accommodate an arbitrary number of guidances, sourced from a variety of
conditions and models. Employing GCFG enables us to separate conditional
guidance into two distinct components: concept guidance for fidelity and
control guidance for controllability. This division makes it feasible to train
a specialized model for concept guidance, while ensuring both control and
unconditional guidance remain intact. We then present a null-text
Concept-centric Diffusion Model as a concept-specific generator to learn
concept guidance without the need for text annotations. Code will be available
at https://github.com/PRIV-Creation/Concept-centric-Personalization.
Related papers
- How to Continually Adapt Text-to-Image Diffusion Models for Flexible Customization? [91.49559116493414]
We propose a novel Concept-Incremental text-to-image Diffusion Model (CIDM)
It can resolve catastrophic forgetting and concept neglect to learn new customization tasks in a concept-incremental manner.
Experiments validate that our CIDM surpasses existing custom diffusion models.
arXiv Detail & Related papers (2024-10-23T06:47:29Z) - Concept Conductor: Orchestrating Multiple Personalized Concepts in Text-to-Image Synthesis [14.21719970175159]
Concept Conductor is designed to ensure visual fidelity and correct layout in multi-concept customization.
We present a concept injection technique that employs shape-aware masks to specify the generation area for each concept.
Our method supports the combination of any number of concepts and maintains high fidelity even when dealing with visually similar concepts.
arXiv Detail & Related papers (2024-08-07T08:43:58Z) - ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction [20.43411883845885]
We introduce a novel task named Unsupervised Concept Extraction (UCE) that considers an unsupervised setting without any human knowledge of the concepts.
Given an image that contains multiple concepts, the task aims to extract and recreate individual concepts solely relying on the existing knowledge from pretrained diffusion models.
We present ConceptExpress that tackles UCE by unleashing the inherent capabilities of pretrained diffusion models in two aspects.
arXiv Detail & Related papers (2024-07-09T17:50:28Z) - Improving Intervention Efficacy via Concept Realignment in Concept Bottleneck Models [57.86303579812877]
Concept Bottleneck Models (CBMs) ground image classification on human-understandable concepts to allow for interpretable model decisions.
Existing approaches often require numerous human interventions per image to achieve strong performances.
We introduce a trainable concept realignment intervention module, which leverages concept relations to realign concept assignments post-intervention.
arXiv Detail & Related papers (2024-05-02T17:59:01Z) - Infusion: Preventing Customized Text-to-Image Diffusion from Overfitting [51.606819347636076]
We analyze concept-agnostic overfitting, which undermines non-customized concept knowledge, and concept-specific overfitting, which is confined to customize on limited modalities.
We propose Infusion, a T2I customization method that enables the learning of target concepts to avoid being constrained by limited training modalities.
arXiv Detail & Related papers (2024-04-22T09:16:25Z) - MC$^2$: Multi-concept Guidance for Customized Multi-concept Generation [49.935634230341904]
We introduce the Multi-concept guidance for Multi-concept customization, termed MC$2$, for improved flexibility and fidelity.
MC$2$ decouples the requirements for model architecture via inference time optimization.
It adaptively refines the attention weights between visual and textual tokens, directing image regions to focus on their associated words.
arXiv Detail & Related papers (2024-04-08T07:59:04Z) - LoRA-Composer: Leveraging Low-Rank Adaptation for Multi-Concept Customization in Training-Free Diffusion Models [33.379758040084894]
Multi-concept customization emerges as the challenging task within this domain.
Existing approaches often rely on training a fusion matrix of multiple Low-Rank Adaptations (LoRAs) to merge various concepts into a single image.
LoRA-Composer is a training-free framework designed for seamlessly integrating multiple LoRAs.
arXiv Detail & Related papers (2024-03-18T09:58:52Z) - Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept
Customization of Diffusion Models [72.67967883658957]
Public large-scale text-to-image diffusion models can be easily customized for new concepts using low-rank adaptations (LoRAs)
The utilization of multiple concept LoRAs to jointly support multiple customized concepts presents a challenge.
We propose a new framework called Mix-of-Show that addresses the challenges of decentralized multi-concept customization.
arXiv Detail & Related papers (2023-05-29T17:58:16Z) - Unsupervised Learning of Compositional Energy Concepts [70.11673173291426]
We propose COMET, which discovers and represents concepts as separate energy functions.
Comet represents both global concepts as well as objects under a unified framework.
arXiv Detail & Related papers (2021-11-04T17:46:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.