Orthogonal Adaptation for Modular Customization of Diffusion Models
- URL: http://arxiv.org/abs/2312.02432v2
- Date: Fri, 6 Sep 2024 03:00:09 GMT
- Title: Orthogonal Adaptation for Modular Customization of Diffusion Models
- Authors: Ryan Po, Guandao Yang, Kfir Aberman, Gordon Wetzstein,
- Abstract summary: We address a new problem called Modular Customization, with the goal of efficiently merging customized models.
We introduce Orthogonal Adaptation, a method designed to encourage the customized models, which do not have access to each other during fine-tuning.
Our proposed method is both simple and versatile, applicable to nearly all optimizable weights in the model architecture.
- Score: 39.62438974450659
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Customization techniques for text-to-image models have paved the way for a wide range of previously unattainable applications, enabling the generation of specific concepts across diverse contexts and styles. While existing methods facilitate high-fidelity customization for individual concepts or a limited, pre-defined set of them, they fall short of achieving scalability, where a single model can seamlessly render countless concepts. In this paper, we address a new problem called Modular Customization, with the goal of efficiently merging customized models that were fine-tuned independently for individual concepts. This allows the merged model to jointly synthesize concepts in one image without compromising fidelity or incurring any additional computational costs. To address this problem, we introduce Orthogonal Adaptation, a method designed to encourage the customized models, which do not have access to each other during fine-tuning, to have orthogonal residual weights. This ensures that during inference time, the customized models can be summed with minimal interference. Our proposed method is both simple and versatile, applicable to nearly all optimizable weights in the model architecture. Through an extensive set of quantitative and qualitative evaluations, our method consistently outperforms relevant baselines in terms of efficiency and identity preservation, demonstrating a significant leap toward scalable customization of diffusion models.
Related papers
- You Only Merge Once: Learning the Pareto Set of Preference-Aware Model Merging [11.186194228460273]
We propose preference-aware model merging in which the performance of the merged model on each base model's task is treated as an objective.
In only one merging process, the proposed parameter-efficient structure can generate the whole set of merged models.
We show that the proposed preference-aware model merging can obtain a diverse set of trade-off models and outperforms state-of-the-art model merging baselines.
arXiv Detail & Related papers (2024-08-22T03:41:14Z) - JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation [49.997839600988875]
Existing personalization methods rely on finetuning a text-to-image foundation model on a user's custom dataset.
We propose Joint-Image Diffusion (jedi), an effective technique for learning a finetuning-free personalization model.
Our model achieves state-of-the-art generation quality, both quantitatively and qualitatively, significantly outperforming both the prior finetuning-based and finetuning-free personalization baselines.
arXiv Detail & Related papers (2024-07-08T17:59:02Z) - Revisiting Implicit Models: Sparsity Trade-offs Capability in
Weight-tied Model for Vision Tasks [4.872984658007499]
Implicit models such as Deep Equilibrium Models (DEQs) have garnered significant attention in the community for their ability to train infinite layer models.
We revisit the line of implicit models and trace them back to the original weight-tied models.
Surprisingly, we observe that weight-tied models are more effective, stable, as well as efficient on vision tasks, compared to the DEQ variants.
arXiv Detail & Related papers (2023-07-16T11:45:35Z) - Fast Adaptation with Bradley-Terry Preference Models in Text-To-Image
Classification and Generation [0.0]
We leverage the Bradley-Terry preference model to develop a fast adaptation method that efficiently fine-tunes the original model.
Extensive evidence of the capabilities of this framework is provided through experiments in different domains related to multimodal text and image understanding.
arXiv Detail & Related papers (2023-07-15T07:53:12Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z) - Multi-Concept Customization of Text-to-Image Diffusion [51.8642043743222]
We propose Custom Diffusion, an efficient method for augmenting existing text-to-image models.
We find that only optimizing a few parameters in the text-to-image conditioning mechanism is sufficiently powerful to represent new concepts.
Our model generates variations of multiple new concepts and seamlessly composes them with existing concepts in novel settings.
arXiv Detail & Related papers (2022-12-08T18:57:02Z) - Switchable Representation Learning Framework with Self-compatibility [50.48336074436792]
We propose a Switchable representation learning Framework with Self-Compatibility (SFSC)
SFSC generates a series of compatible sub-models with different capacities through one training process.
SFSC achieves state-of-the-art performance on the evaluated datasets.
arXiv Detail & Related papers (2022-06-16T16:46:32Z) - Slimmable Domain Adaptation [112.19652651687402]
We introduce a simple framework, Slimmable Domain Adaptation, to improve cross-domain generalization with a weight-sharing model bank.
Our framework surpasses other competing approaches by a very large margin on multiple benchmarks.
arXiv Detail & Related papers (2022-06-14T06:28:04Z) - A Nested Weighted Tchebycheff Multi-Objective Bayesian Optimization
Approach for Flexibility of Unknown Utopia Estimation in Expensive Black-box
Design Problems [0.0]
In existing work, a weighted Tchebycheff MOBO approach has been demonstrated which attempts to estimate the unknown utopia in formulating acquisition function.
We propose a nested weighted Tchebycheff MOBO framework where we build a regression model selection procedure from an ensemble of models.
arXiv Detail & Related papers (2021-10-16T00:44:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.