Block-wise LoRA: Revisiting Fine-grained LoRA for Effective
Personalization and Stylization in Text-to-Image Generation
- URL: http://arxiv.org/abs/2403.07500v1
- Date: Tue, 12 Mar 2024 10:38:03 GMT
- Title: Block-wise LoRA: Revisiting Fine-grained LoRA for Effective
Personalization and Stylization in Text-to-Image Generation
- Authors: Likun Li, Haoqi Zeng, Changpeng Yang, Haozhe Jia, Di Xu
- Abstract summary: The objective of personalization and stylization in text-to-image is to instruct a pre-trained diffusion model to analyze new concepts introduced by users and incorporate them into expected styles.
We propose block-wise Low-Rank Adaptation (LoRA) to perform fine-grained fine-tuning for different blocks of SD.
- Score: 2.2356314962198836
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The objective of personalization and stylization in text-to-image is to
instruct a pre-trained diffusion model to analyze new concepts introduced by
users and incorporate them into expected styles. Recently, parameter-efficient
fine-tuning (PEFT) approaches have been widely adopted to address this task and
have greatly propelled the development of this field. Despite their popularity,
existing efficient fine-tuning methods still struggle to achieve effective
personalization and stylization in T2I generation. To address this issue, we
propose block-wise Low-Rank Adaptation (LoRA) to perform fine-grained
fine-tuning for different blocks of SD, which can generate images faithful to
input prompts and target identity and also with desired style. Extensive
experiments demonstrate the effectiveness of the proposed method.
Related papers
- DiffLoRA: Generating Personalized Low-Rank Adaptation Weights with Diffusion [42.38655393158855]
We propose DiffLoRA, a novel approach that leverages diffusion models as a hypernetwork to predict personalized low-rank adaptation weights.
By integrating these LoRA weights into the text-to-image model, DiffLoRA achieves personalization during inference without further training.
arXiv Detail & Related papers (2024-08-13T09:00:35Z) - TALE: Training-free Cross-domain Image Composition via Adaptive Latent Manipulation and Energy-guided Optimization [59.412236435627094]
TALE is a training-free framework harnessing the generative capabilities of text-to-image diffusion models.
We equip TALE with two mechanisms dubbed Adaptive Latent Manipulation and Energy-guided Latent Optimization.
Our experiments demonstrate that TALE surpasses prior baselines and attains state-of-the-art performance in image-guided composition.
arXiv Detail & Related papers (2024-08-07T08:52:21Z) - Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning [40.06403155373455]
We propose a novel reinforcement learning framework for personalized text-to-image generation.
Our proposed approach outperforms existing state-of-the-art methods by a large margin on visual fidelity while maintaining text-alignment.
arXiv Detail & Related papers (2024-07-09T08:11:53Z) - PaRa: Personalizing Text-to-Image Diffusion via Parameter Rank Reduction [38.424899483761656]
PaRa is an effective and efficient Rank Reduction approach for T2I model personalization.
Our design is motivated by the fact that taming a T2I model toward a novel concept implies a small generation space.
We show that PaRa achieves great advantages over existing finetuning approaches on single/multi-subject generation as well as single-image editing.
arXiv Detail & Related papers (2024-06-09T04:51:51Z) - StyleInject: Parameter Efficient Tuning of Text-to-Image Diffusion Models [35.732715025002705]
StyleInject is a specialized fine-tuning approach tailored for text-to-image models.
It adapts to varying styles by adjusting the variance of visual features based on the characteristics of the input signal.
It proves particularly effective in learning from and enhancing a range of advanced, community-fine-tuned generative models.
arXiv Detail & Related papers (2024-01-25T04:53:03Z) - Harnessing Diffusion Models for Visual Perception with Meta Prompts [68.78938846041767]
We propose a simple yet effective scheme to harness a diffusion model for visual perception tasks.
We introduce learnable embeddings (meta prompts) to the pre-trained diffusion models to extract proper features for perception.
Our approach achieves new performance records in depth estimation tasks on NYU depth V2 and KITTI, and in semantic segmentation task on CityScapes.
arXiv Detail & Related papers (2023-12-22T14:40:55Z) - ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs [56.85106417530364]
Low-rank adaptations (LoRA) have been proposed as a parameter-efficient way of achieving concept-driven personalization.
We propose ZipLoRA, a method to cheaply and effectively merge independently trained style and subject LoRAs.
Experiments show that ZipLoRA can generate compelling results with meaningful improvements over baselines in subject and style fidelity.
arXiv Detail & Related papers (2023-11-22T18:59:36Z) - Domain-Agnostic Tuning-Encoder for Fast Personalization of Text-To-Image
Models [59.094601993993535]
Text-to-image (T2I) personalization allows users to combine their own visual concepts in natural language prompts.
Most existing encoders are limited to a single-class domain, which hinders their ability to handle diverse concepts.
We propose a domain-agnostic method that does not require any specialized dataset or prior information about the personalized concepts.
arXiv Detail & Related papers (2023-07-13T17:46:42Z) - Lafite2: Few-shot Text-to-Image Generation [132.14211027057766]
We propose a novel method for pre-training text-to-image generation model on image-only datasets.
It considers a retrieval-then-optimization procedure to synthesize pseudo text features.
It can be beneficial to a wide range of settings, including the few-shot, semi-supervised and fully-supervised learning.
arXiv Detail & Related papers (2022-10-25T16:22:23Z) - A Generic Approach for Enhancing GANs by Regularized Latent Optimization [79.00740660219256]
We introduce a generic framework called em generative-model inference that is capable of enhancing pre-trained GANs effectively and seamlessly.
Our basic idea is to efficiently infer the optimal latent distribution for the given requirements using Wasserstein gradient flow techniques.
arXiv Detail & Related papers (2021-12-07T05:22:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.