Related papers: Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models

Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models

URL: http://arxiv.org/abs/2305.18292v2
Date: Fri, 10 Nov 2023 00:01:55 GMT
Title: Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models
Authors: Yuchao Gu, Xintao Wang, Jay Zhangjie Wu, Yujun Shi, Yunpeng Chen, Zihan Fan, Wuyou Xiao, Rui Zhao, Shuning Chang, Weijia Wu, Yixiao Ge, Ying Shan, Mike Zheng Shou
Abstract summary: Public large-scale text-to-image diffusion models can be easily customized for new concepts using low-rank adaptations (LoRAs) The utilization of multiple concept LoRAs to jointly support multiple customized concepts presents a challenge. We propose a new framework called Mix-of-Show that addresses the challenges of decentralized multi-concept customization.
Score: 72.67967883658957
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Public large-scale text-to-image diffusion models, such as Stable Diffusion, have gained significant attention from the community. These models can be easily customized for new concepts using low-rank adaptations (LoRAs). However, the utilization of multiple concept LoRAs to jointly support multiple customized concepts presents a challenge. We refer to this scenario as decentralized multi-concept customization, which involves single-client concept tuning and center-node concept fusion. In this paper, we propose a new framework called Mix-of-Show that addresses the challenges of decentralized multi-concept customization, including concept conflicts resulting from existing single-client LoRA tuning and identity loss during model fusion. Mix-of-Show adopts an embedding-decomposed LoRA (ED-LoRA) for single-client tuning and gradient fusion for the center node to preserve the in-domain essence of single concepts and support theoretically limitless concept fusion. Additionally, we introduce regionally controllable sampling, which extends spatially controllable sampling (e.g., ControlNet and T2I-Adaptor) to address attribute binding and missing object problems in multi-concept sampling. Extensive experiments demonstrate that Mix-of-Show is capable of composing multiple customized concepts with high fidelity, including characters, objects, and scenes.

Related papers

Modular Customization of Diffusion Models via Blockwise-Parameterized Low-Rank Adaptation [73.16975077770765]
Modular customization is essential for applications like concept stylization and multi-concept customization. Instant merging methods often cause identity loss and interference of individual merged concepts. We propose BlockLoRA, an instant merging method designed to efficiently combine multiple concepts while accurately preserving individual concepts' identity.
arXiv Detail & Related papers (2025-03-11T16:10:36Z)
LoRACLR: Contrastive Adaptation for Customization of Diffusion Models [62.70911549650579]
LoRACLR is a novel approach for multi-concept image generation that merges multiple LoRA models, each fine-tuned for a distinct concept, into a single, unified model. LoRACLR uses a contrastive objective to align and merge the weight spaces of these models, ensuring compatibility while minimizing interference. Our results highlight the effectiveness of LoRACLR in accurately merging multiple concepts, advancing the capabilities of personalized image generation.
arXiv Detail & Related papers (2024-12-12T18:59:55Z)
How to Continually Adapt Text-to-Image Diffusion Models for Flexible Customization? [91.49559116493414]
We propose a novel Concept-Incremental text-to-image Diffusion Model (CIDM) It can resolve catastrophic forgetting and concept neglect to learn new customization tasks in a concept-incremental manner. Experiments validate that our CIDM surpasses existing custom diffusion models.
arXiv Detail & Related papers (2024-10-23T06:47:29Z)
Improving Intervention Efficacy via Concept Realignment in Concept Bottleneck Models [57.86303579812877]
Concept Bottleneck Models (CBMs) ground image classification on human-understandable concepts to allow for interpretable model decisions. Existing approaches often require numerous human interventions per image to achieve strong performances. We introduce a trainable concept realignment intervention module, which leverages concept relations to realign concept assignments post-intervention.
arXiv Detail & Related papers (2024-05-02T17:59:01Z)
Multi-view Aggregation Network for Dichotomous Image Segmentation [76.75904424539543]
Dichotomous Image (DIS) has recently emerged towards high-precision object segmentation from high-resolution natural images. Existing methods rely on tedious multiple encoder-decoder streams and stages to gradually complete the global localization and local refinement. Inspired by it, we model DIS as a multi-view object perception problem and provide a parsimonious multi-view aggregation network (MVANet) Experiments on the popular DIS-5K dataset show that our MVANet significantly outperforms state-of-the-art methods in both accuracy and speed.
arXiv Detail & Related papers (2024-04-11T03:00:00Z)
MC$^2$: Multi-concept Guidance for Customized Multi-concept Generation [49.935634230341904]
We introduce the Multi-concept guidance for Multi-concept customization, termed MC$2$, for improved flexibility and fidelity. MC$2$ decouples the requirements for model architecture via inference time optimization. It adaptively refines the attention weights between visual and textual tokens, directing image regions to focus on their associated words.
arXiv Detail & Related papers (2024-04-08T07:59:04Z)
LoRA-Composer: Leveraging Low-Rank Adaptation for Multi-Concept Customization in Training-Free Diffusion Models [33.379758040084894]
Multi-concept customization emerges as the challenging task within this domain. Existing approaches often rely on training a fusion matrix of multiple Low-Rank Adaptations (LoRAs) to merge various concepts into a single image. LoRA-Composer is a training-free framework designed for seamlessly integrating multiple LoRAs.
arXiv Detail & Related papers (2024-03-18T09:58:52Z)
OMG: Occlusion-friendly Personalized Multi-concept Generation in Diffusion Models [47.63060402915307]
OMG is a framework designed to seamlessly integrate multiple concepts within a single image. OMG exhibits superior performance in multi-concept personalization. LoRA models on civitai.com can be exploited directly.
arXiv Detail & Related papers (2024-03-16T17:30:15Z)
Concept-centric Personalization with Large-scale Diffusion Priors [7.684688573874212]
We present the task of customizing large-scale diffusion priors for specific concepts as conceptcentric personalization. Our goal is to generate high-quality concept-centric images while maintaining the versatile controllability inherent to openworld models.
arXiv Detail & Related papers (2023-12-13T14:59:49Z)
FedDiff: Diffusion Model Driven Federated Learning for Multi-Modal and Multi-Clients [32.59184269562571]
We propose a multi-modal collaborative diffusion federated learning framework called FedDiff. Our framework establishes a dual-branch diffusion model feature extraction setup, where the two modal data are inputted into separate branches of the encoder. Considering the challenge of private and efficient communication between multiple clients, we embed the diffusion model into the federated learning communication structure.
arXiv Detail & Related papers (2023-11-16T02:29:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.