PaRa: Personalizing Text-to-Image Diffusion via Parameter Rank Reduction
- URL: http://arxiv.org/abs/2406.05641v1
- Date: Sun, 9 Jun 2024 04:51:51 GMT
- Title: PaRa: Personalizing Text-to-Image Diffusion via Parameter Rank Reduction
- Authors: Shangyu Chen, Zizheng Pan, Jianfei Cai, Dinh Phung,
- Abstract summary: PaRa is an effective and efficient Rank Reduction approach for T2I model personalization.
Our design is motivated by the fact that taming a T2I model toward a novel concept implies a small generation space.
We show that PaRa achieves great advantages over existing finetuning approaches on single/multi-subject generation as well as single-image editing.
- Score: 38.424899483761656
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Personalizing a large-scale pretrained Text-to-Image (T2I) diffusion model is challenging as it typically struggles to make an appropriate trade-off between its training data distribution and the target distribution, i.e., learning a novel concept with only a few target images to achieve personalization (aligning with the personalized target) while preserving text editability (aligning with diverse text prompts). In this paper, we propose PaRa, an effective and efficient Parameter Rank Reduction approach for T2I model personalization by explicitly controlling the rank of the diffusion model parameters to restrict its initial diverse generation space into a small and well-balanced target space. Our design is motivated by the fact that taming a T2I model toward a novel concept such as a specific art style implies a small generation space. To this end, by reducing the rank of model parameters during finetuning, we can effectively constrain the space of the denoising sampling trajectories towards the target. With comprehensive experiments, we show that PaRa achieves great advantages over existing finetuning approaches on single/multi-subject generation as well as single-image editing. Notably, compared to the prevailing fine-tuning technique LoRA, PaRa achieves better parameter efficiency (2x fewer learnable parameters) and much better target image alignment.
Related papers
- SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation [52.6922833948127]
In this work, we investigate the importance of parameters in pre-trained diffusion models.
We propose a novel model fine-tuning method to make full use of these ineffective parameters.
Our method enhances the generative capabilities of pre-trained models in downstream applications.
arXiv Detail & Related papers (2024-09-10T16:44:47Z) - Block-wise LoRA: Revisiting Fine-grained LoRA for Effective
Personalization and Stylization in Text-to-Image Generation [2.2356314962198836]
The objective of personalization and stylization in text-to-image is to instruct a pre-trained diffusion model to analyze new concepts introduced by users and incorporate them into expected styles.
We propose block-wise Low-Rank Adaptation (LoRA) to perform fine-grained fine-tuning for different blocks of SD.
arXiv Detail & Related papers (2024-03-12T10:38:03Z) - DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized
Diffusion Models [46.58122934173729]
textbftextitDiffuseKronA is a product-based adaptation module for subject-driven text-to-image (T2I) generative models.
It significantly reduces the parameter count by 35% and 99.947% compared to LoRA-DreamBooth and the original DreamBooth, respectively.
It can achieve up to a 50% reduction with results comparable to LoRA-DreamBooth.
arXiv Detail & Related papers (2024-02-27T11:05:34Z) - Direct Consistency Optimization for Compositional Text-to-Image
Personalization [73.94505688626651]
Text-to-image (T2I) diffusion models, when fine-tuned on a few personal images, are able to generate visuals with a high degree of consistency.
We propose to fine-tune the T2I model by maximizing consistency to reference images, while penalizing the deviation from the pretrained model.
arXiv Detail & Related papers (2024-02-19T09:52:41Z) - E$^{2}$GAN: Efficient Training of Efficient GANs for Image-to-Image Translation [69.72194342962615]
We introduce and address a novel research direction: can the process of distilling GANs from diffusion models be made significantly more efficient?
First, we construct a base GAN model with generalized features, adaptable to different concepts through fine-tuning, eliminating the need for training from scratch.
Second, we identify crucial layers within the base GAN model and employ Low-Rank Adaptation (LoRA) with a simple yet effective rank search process, rather than fine-tuning the entire base model.
Third, we investigate the minimal amount of data necessary for fine-tuning, further reducing the overall training time.
arXiv Detail & Related papers (2024-01-11T18:59:14Z) - Separate-and-Enhance: Compositional Finetuning for Text2Image Diffusion
Models [58.46926334842161]
This work illuminates the fundamental reasons for such misalignment, pinpointing issues related to low attention activation scores and mask overlaps.
We propose two novel objectives, the Separate loss and the Enhance loss, that reduce object mask overlaps and maximize attention scores.
Our method diverges from conventional test-time-adaptation techniques, focusing on finetuning critical parameters, which enhances scalability and generalizability.
arXiv Detail & Related papers (2023-12-10T22:07:42Z) - SVDiff: Compact Parameter Space for Diffusion Fine-Tuning [19.978410014103435]
We propose a novel approach to address limitations in existing text-to-image diffusion models for personalization.
Our method involves fine-tuning the singular values of the weight matrices, leading to a compact and efficient parameter space.
We also propose a Cut-Mix-Unmix data-augmentation technique to enhance the quality of multi-subject image generation and a simple text-based image editing framework.
arXiv Detail & Related papers (2023-03-20T17:45:02Z) - Lafite2: Few-shot Text-to-Image Generation [132.14211027057766]
We propose a novel method for pre-training text-to-image generation model on image-only datasets.
It considers a retrieval-then-optimization procedure to synthesize pseudo text features.
It can be beneficial to a wide range of settings, including the few-shot, semi-supervised and fully-supervised learning.
arXiv Detail & Related papers (2022-10-25T16:22:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.