Related papers: Concept-centric Personalization with Large-scale Diffusion Priors

Concept-centric Personalization with Large-scale Diffusion Priors

URL: http://arxiv.org/abs/2312.08195v1
Date: Wed, 13 Dec 2023 14:59:49 GMT
Title: Concept-centric Personalization with Large-scale Diffusion Priors
Authors: Pu Cao, Lu Yang, Feng Zhou, Tianrui Huang, Qing Song
Abstract summary: We present the task of customizing large-scale diffusion priors for specific concepts as conceptcentric personalization. Our goal is to generate high-quality concept-centric images while maintaining the versatile controllability inherent to openworld models.
Score: 7.684688573874212
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Despite large-scale diffusion models being highly capable of generating diverse open-world content, they still struggle to match the photorealism and fidelity of concept-specific generators. In this work, we present the task of customizing large-scale diffusion priors for specific concepts as concept-centric personalization. Our goal is to generate high-quality concept-centric images while maintaining the versatile controllability inherent to open-world models, enabling applications in diverse tasks such as concept-centric stylization and image translation. To tackle these challenges, we identify catastrophic forgetting of guidance prediction from diffusion priors as the fundamental issue. Consequently, we develop a guidance-decoupled personalization framework specifically designed to address this task. We propose Generalized Classifier-free Guidance (GCFG) as the foundational theory for our framework. This approach extends Classifier-free Guidance (CFG) to accommodate an arbitrary number of guidances, sourced from a variety of conditions and models. Employing GCFG enables us to separate conditional guidance into two distinct components: concept guidance for fidelity and control guidance for controllability. This division makes it feasible to train a specialized model for concept guidance, while ensuring both control and unconditional guidance remain intact. We then present a null-text Concept-centric Diffusion Model as a concept-specific generator to learn concept guidance without the need for text annotations. Code will be available at https://github.com/PRIV-Creation/Concept-centric-Personalization.

Related papers

Domain Guidance: A Simple Transfer Approach for a Pre-trained Diffusion Model [62.11981915549919]
Domain Guidance is a transfer approach that leverages pre-trained knowledge to guide the sampling process toward the target domain. We demonstrate its substantial effectiveness across various transfer benchmarks, achieving over a 19.6% improvement in FID and a 23.4% improvement in FD$_textDINOv2$ compared to standard fine-tuning.
arXiv Detail & Related papers (2025-04-02T09:07:55Z)
Object Style Diffusion for Generalized Object Detection in Urban Scene [69.04189353993907]
We introduce a novel single-domain object detection generalization method, named GoDiff. By integrating pseudo-target domain data with source domain data, we diversify the training dataset. Experimental results demonstrate that our method not only enhances the generalization ability of existing detectors but also functions as a plug-and-play enhancement for other single-domain generalization methods.
arXiv Detail & Related papers (2024-12-18T13:03:00Z)
How to Continually Adapt Text-to-Image Diffusion Models for Flexible Customization? [91.49559116493414]
We propose a novel Concept-Incremental text-to-image Diffusion Model (CIDM) It can resolve catastrophic forgetting and concept neglect to learn new customization tasks in a concept-incremental manner. Experiments validate that our CIDM surpasses existing custom diffusion models.
arXiv Detail & Related papers (2024-10-23T06:47:29Z)
Concept Conductor: Orchestrating Multiple Personalized Concepts in Text-to-Image Synthesis [14.21719970175159]
Concept Conductor is designed to ensure visual fidelity and correct layout in multi-concept customization. We present a concept injection technique that employs shape-aware masks to specify the generation area for each concept. Our method supports the combination of any number of concepts and maintains high fidelity even when dealing with visually similar concepts.
arXiv Detail & Related papers (2024-08-07T08:43:58Z)
ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction [20.43411883845885]
We introduce a novel task named Unsupervised Concept Extraction (UCE) that considers an unsupervised setting without any human knowledge of the concepts. Given an image that contains multiple concepts, the task aims to extract and recreate individual concepts solely relying on the existing knowledge from pretrained diffusion models. We present ConceptExpress that tackles UCE by unleashing the inherent capabilities of pretrained diffusion models in two aspects.
arXiv Detail & Related papers (2024-07-09T17:50:28Z)
Regularized Training with Generated Datasets for Name-Only Transfer of Vision-Language Models [36.59260354292177]
Recent advancements in text-to-image generation have inspired researchers to generate datasets tailored for perception models using generative models. We aim to fine-tune vision-language models to a specific classification model without access to any real images. Despite the high fidelity of generated images, we observed a significant performance degradation when fine-tuning the model using the generated datasets.
arXiv Detail & Related papers (2024-06-08T10:43:49Z)
Improving Intervention Efficacy via Concept Realignment in Concept Bottleneck Models [57.86303579812877]
Concept Bottleneck Models (CBMs) ground image classification on human-understandable concepts to allow for interpretable model decisions. Existing approaches often require numerous human interventions per image to achieve strong performances. We introduce a trainable concept realignment intervention module, which leverages concept relations to realign concept assignments post-intervention.
arXiv Detail & Related papers (2024-05-02T17:59:01Z)
Infusion: Preventing Customized Text-to-Image Diffusion from Overfitting [51.606819347636076]
We analyze concept-agnostic overfitting, which undermines non-customized concept knowledge, and concept-specific overfitting, which is confined to customize on limited modalities. We propose Infusion, a T2I customization method that enables the learning of target concepts to avoid being constrained by limited training modalities.
arXiv Detail & Related papers (2024-04-22T09:16:25Z)
MC$^2$: Multi-concept Guidance for Customized Multi-concept Generation [49.935634230341904]
We introduce the Multi-concept guidance for Multi-concept customization, termed MC$2$, for improved flexibility and fidelity. MC$2$ decouples the requirements for model architecture via inference time optimization. It adaptively refines the attention weights between visual and textual tokens, directing image regions to focus on their associated words.
arXiv Detail & Related papers (2024-04-08T07:59:04Z)
Multi-BERT: Leveraging Adapters and Prompt Tuning for Low-Resource Multi-Domain Adaptation [14.211024633768986]
The rapid expansion of texts' volume and diversity presents formidable challenges in multi-domain settings. Traditional approaches, either employing a unified model for multiple domains or individual models for each domain, frequently pose significant limitations. This paper introduces a novel approach composed of one core model with multiple sets of domain-specific parameters.
arXiv Detail & Related papers (2024-04-02T22:15:48Z)
LoRA-Composer: Leveraging Low-Rank Adaptation for Multi-Concept Customization in Training-Free Diffusion Models [33.379758040084894]
Multi-concept customization emerges as the challenging task within this domain. Existing approaches often rely on training a fusion matrix of multiple Low-Rank Adaptations (LoRAs) to merge various concepts into a single image. LoRA-Composer is a training-free framework designed for seamlessly integrating multiple LoRAs.
arXiv Detail & Related papers (2024-03-18T09:58:52Z)
HCVP: Leveraging Hierarchical Contrastive Visual Prompt for Domain Generalization [69.33162366130887]
Domain Generalization (DG) endeavors to create machine learning models that excel in unseen scenarios by learning invariant features. We introduce a novel method designed to supplement the model with domain-level and task-specific characteristics. This approach aims to guide the model in more effectively separating invariant features from specific characteristics, thereby boosting the generalization.
arXiv Detail & Related papers (2024-01-18T04:23:21Z)
DomainStudio: Fine-Tuning Diffusion Models for Domain-Driven Image Generation using Limited Data [20.998032566820907]
This paper proposes a novel DomainStudio approach to adapt DDPMs pre-trained on large-scale source datasets to target domains using limited data. It is designed to keep the diversity of subjects provided by source domains and get high-quality and diverse adapted samples in target domains.
arXiv Detail & Related papers (2023-06-25T07:40:39Z)
Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models [72.67967883658957]
Public large-scale text-to-image diffusion models can be easily customized for new concepts using low-rank adaptations (LoRAs) The utilization of multiple concept LoRAs to jointly support multiple customized concepts presents a challenge. We propose a new framework called Mix-of-Show that addresses the challenges of decentralized multi-concept customization.
arXiv Detail & Related papers (2023-05-29T17:58:16Z)
Pre-training Contextualized World Models with In-the-wild Videos for Reinforcement Learning [54.67880602409801]
In this paper, we study the problem of pre-training world models with abundant in-the-wild videos for efficient learning of visual control tasks. We introduce Contextualized World Models (ContextWM) that explicitly separate context and dynamics modeling. Our experiments show that in-the-wild video pre-training equipped with ContextWM can significantly improve the sample efficiency of model-based reinforcement learning.
arXiv Detail & Related papers (2023-05-29T14:29:12Z)
On Balancing Bias and Variance in Unsupervised Multi-Source-Free Domain Adaptation [6.2200089460762085]
Methods for multi-source-free domain adaptation (MSFDA) typically train a target model using pseudo-labeled data produced by the source models. We develop an information-theoretic bound on the generalization error of the resulting target model. We then provide insights on how to balance this trade-off from three perspectives, including domain aggregation, selective pseudo-labeling, and joint feature alignment.
arXiv Detail & Related papers (2022-02-01T22:34:18Z)
A Novel Mix-normalization Method for Generalizable Multi-source Person Re-identification [49.548815417844786]
Person re-identification (Re-ID) has achieved great success in the supervised scenario. It is difficult to directly transfer the supervised model to arbitrary unseen domains due to the model overfitting to the seen source domains. We propose MixNorm, which consists of domain-aware mix-normalization (DMN) and domain-ware center regularization (DCR)
arXiv Detail & Related papers (2022-01-24T18:09:38Z)
Unsupervised Learning of Compositional Energy Concepts [70.11673173291426]
We propose COMET, which discovers and represents concepts as separate energy functions. Comet represents both global concepts as well as objects under a unified framework.
arXiv Detail & Related papers (2021-11-04T17:46:12Z)
Learning to Generate Novel Domains for Domain Generalization [115.21519842245752]
This paper focuses on the task of learning from multiple source domains a model that generalizes well to unseen domains. We employ a data generator to synthesize data from pseudo-novel domains to augment the source domains. Our method, L2A-OT, outperforms current state-of-the-art DG methods on four benchmark datasets.
arXiv Detail & Related papers (2020-07-07T09:34:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.