DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized
Diffusion Models
- URL: http://arxiv.org/abs/2402.17412v2
- Date: Wed, 28 Feb 2024 09:49:32 GMT
- Title: DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized
Diffusion Models
- Authors: Shyam Marjit, Harshit Singh, Nityanand Mathur, Sayak Paul, Chia-Mu Yu,
Pin-Yu Chen
- Abstract summary: textbftextitDiffuseKronA is a product-based adaptation module for subject-driven text-to-image (T2I) generative models.
It significantly reduces the parameter count by 35% and 99.947% compared to LoRA-DreamBooth and the original DreamBooth, respectively.
It can achieve up to a 50% reduction with results comparable to LoRA-DreamBooth.
- Score: 46.58122934173729
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the realm of subject-driven text-to-image (T2I) generative models, recent
developments like DreamBooth and BLIP-Diffusion have led to impressive results
yet encounter limitations due to their intensive fine-tuning demands and
substantial parameter requirements. While the low-rank adaptation (LoRA) module
within DreamBooth offers a reduction in trainable parameters, it introduces a
pronounced sensitivity to hyperparameters, leading to a compromise between
parameter efficiency and the quality of T2I personalized image synthesis.
Addressing these constraints, we introduce \textbf{\textit{DiffuseKronA}}, a
novel Kronecker product-based adaptation module that not only significantly
reduces the parameter count by 35\% and 99.947\% compared to LoRA-DreamBooth
and the original DreamBooth, respectively, but also enhances the quality of
image synthesis. Crucially, \textit{DiffuseKronA} mitigates the issue of
hyperparameter sensitivity, delivering consistent high-quality generations
across a wide range of hyperparameters, thereby diminishing the necessity for
extensive fine-tuning. Furthermore, a more controllable decomposition makes
\textit{DiffuseKronA} more interpretable and even can achieve up to a 50\%
reduction with results comparable to LoRA-Dreambooth. Evaluated against diverse
and complex input images and text prompts, \textit{DiffuseKronA} consistently
outperforms existing models, producing diverse images of higher quality with
improved fidelity and a more accurate color distribution of objects, all the
while upholding exceptional parameter efficiency, thus presenting a substantial
advancement in the field of T2I generative modeling. Our project page,
consisting of links to the code, and pre-trained checkpoints, is available at
https://diffusekrona.github.io/.
Related papers
- LoRTA: Low Rank Tensor Adaptation of Large Language Models [70.32218116940393]
Low Rank Adaptation (LoRA) is a popular Efficient Fine Tuning (PEFT) method that effectively adapts large pre-trained models for downstream tasks.
We propose a novel approach that employs a low rank tensor parametrization for model updates.
Our method is both efficient and effective for fine-tuning large language models, achieving a substantial reduction in the number of parameters while maintaining comparable performance.
arXiv Detail & Related papers (2024-10-05T06:59:50Z) - SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction.
SMILE allows for the upscaling of source models into an MoE model without extra data or further training.
We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z) - PaRa: Personalizing Text-to-Image Diffusion via Parameter Rank Reduction [38.424899483761656]
PaRa is an effective and efficient Rank Reduction approach for T2I model personalization.
Our design is motivated by the fact that taming a T2I model toward a novel concept implies a small generation space.
We show that PaRa achieves great advantages over existing finetuning approaches on single/multi-subject generation as well as single-image editing.
arXiv Detail & Related papers (2024-06-09T04:51:51Z) - ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane Reflections [59.839926875976225]
We propose the ETHER transformation family, which performs Efficient fineTuning via HypErplane Reflections.
In particular, we introduce ETHER and its relaxation ETHER+, which match or outperform existing PEFT methods with significantly fewer parameters.
arXiv Detail & Related papers (2024-05-30T17:26:02Z) - Advancing Parameter Efficiency in Fine-tuning via Representation Editing [41.81020951061438]
We propose a novel fine-tuning approach for neural models, named Representation EDiting (RED)
RED modifies the representations generated at some layers through the application of scaling and biasing operations.
Remarkably, RED achieves results comparable or superior to both full parameter fine-tuning and other PEFT methods.
arXiv Detail & Related papers (2024-02-23T08:21:02Z) - E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning [55.50908600818483]
Fine-tuning large-scale pretrained vision models for new tasks has become increasingly parameter-intensive.
We propose an Effective and Efficient Visual Prompt Tuning (E2VPT) approach for large-scale transformer-based model adaptation.
Our approach outperforms several state-of-the-art baselines on two benchmarks.
arXiv Detail & Related papers (2023-07-25T19:03:21Z) - Controlling Text-to-Image Diffusion by Orthogonal Finetuning [74.21549380288631]
We introduce a principled finetuning method -- Orthogonal Finetuning (OFT) for adapting text-to-image diffusion models to downstream tasks.
Unlike existing methods, OFT can provably preserve hyperspherical energy which characterizes the pairwise neuron relationship on the unit hypersphere.
We empirically show that our OFT framework outperforms existing methods in generation quality and convergence speed.
arXiv Detail & Related papers (2023-06-12T17:59:23Z) - SVDiff: Compact Parameter Space for Diffusion Fine-Tuning [19.978410014103435]
We propose a novel approach to address limitations in existing text-to-image diffusion models for personalization.
Our method involves fine-tuning the singular values of the weight matrices, leading to a compact and efficient parameter space.
We also propose a Cut-Mix-Unmix data-augmentation technique to enhance the quality of multi-subject image generation and a simple text-based image editing framework.
arXiv Detail & Related papers (2023-03-20T17:45:02Z) - Learning the Effect of Registration Hyperparameters with HyperMorph [7.313453912494172]
We introduce HyperMorph, a framework that facilitates efficient hyperparameter tuning in learning-based deformable image registration.
We show that it enables fast, high-resolution hyperparameter search at test-time, reducing the inefficiency of traditional approaches.
arXiv Detail & Related papers (2022-03-30T21:30:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.