Contrastive Test-Time Composition of Multiple LoRA Models for Image Generation
- URL: http://arxiv.org/abs/2403.19776v2
- Date: Tue, 29 Jul 2025 20:38:01 GMT
- Title: Contrastive Test-Time Composition of Multiple LoRA Models for Image Generation
- Authors: Tuna Han Salih Meral, Enis Simsar, Federico Tombari, Pinar Yanardag,
- Abstract summary: Low-Rank Adaptation (LoRA) has emerged as a powerful and popular technique for personalization.<n>Existing methods often fall short because the attention mechanisms within different LoRA models overlap.<n>We introduce CLoRA, a training-free approach that addresses these limitations by updating the attention maps of multiple LoRA models at test-time.
- Score: 44.037664077117945
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Low-Rank Adaptation (LoRA) has emerged as a powerful and popular technique for personalization, enabling efficient adaptation of pre-trained image generation models for specific tasks without comprehensive retraining. While employing individual pre-trained LoRA models excels at representing single concepts, such as those representing a specific dog or a cat, utilizing multiple LoRA models to capture a variety of concepts in a single image still poses a significant challenge. Existing methods often fall short, primarily because the attention mechanisms within different LoRA models overlap, leading to scenarios where one concept may be completely ignored (e.g., omitting the dog) or where concepts are incorrectly combined (e.g., producing an image of two cats instead of one cat and one dog). We introduce CLoRA, a training-free approach that addresses these limitations by updating the attention maps of multiple LoRA models at test-time, and leveraging the attention maps to create semantic masks for fusing latent representations. This enables the generation of composite images that accurately reflect the characteristics of each LoRA. Our comprehensive qualitative and quantitative evaluations demonstrate that CLoRA significantly outperforms existing methods in multi-concept image generation using LoRAs.
Related papers
- Pisces: An Auto-regressive Foundation Model for Image Understanding and Generation [81.92275347127833]
A key challenge in developing unified models lies in the inherent differences between the visual features needed for image understanding versus generation.<n>In this work, we introduce Pisces, an auto-regressive multimodal foundation model that addresses this challenge through a novel decoupled visual encoding architecture.<n>Combined with meticulous data curation, pretraining, and finetuning, Pisces achieves competitive performance in both image understanding and image generation.
arXiv Detail & Related papers (2025-06-12T06:37:34Z) - Cached Multi-Lora Composition for Multi-Concept Image Generation [10.433033595844442]
Low-Rank Adaptation (LoRA) has emerged as a widely adopted technique in text-to-image models.
Current approaches face significant challenges when composing these LoRAs for multi-concept image generation.
We introduce a novel, training-free framework, Cached Multi-LoRA (CMLoRA), designed to efficiently integrate multiple LoRAs.
arXiv Detail & Related papers (2025-02-07T13:41:51Z) - A LoRA is Worth a Thousand Pictures [28.928964530616593]
Low Rank Adaptation (LoRA) can replicate an artist's style or subject using minimal data and computation.
We show that LoRA weights alone can serve as an effective descriptor of style, without the need for additional image generation or knowledge of the original training set.
We conclude with a discussion on potential future applications, such as zero-shot LoRA fine-tuning and model attribution.
arXiv Detail & Related papers (2024-12-16T18:18:17Z) - LoRACLR: Contrastive Adaptation for Customization of Diffusion Models [62.70911549650579]
LoRACLR is a novel approach for multi-concept image generation that merges multiple LoRA models, each fine-tuned for a distinct concept, into a single, unified model.
LoRACLR uses a contrastive objective to align and merge the weight spaces of these models, ensuring compatibility while minimizing interference.
Our results highlight the effectiveness of LoRACLR in accurately merging multiple concepts, advancing the capabilities of personalized image generation.
arXiv Detail & Related papers (2024-12-12T18:59:55Z) - LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style Conditioned Image Generation [28.098287135605364]
We introduce LoRA.rar, a method that improves image quality and achieves a remarkable speedup of over $4000times$ in the merging process.
LoRA.rar pre-trains a hypernetwork on a diverse set of content-style LoRA pairs, learning an efficient merging strategy that generalizes to new, unseen content-style pairs.
Our method significantly outperforms the current state of the art in both content and style fidelity, as validated by MLLM assessments and human evaluations.
arXiv Detail & Related papers (2024-12-06T16:04:56Z) - LoRA of Change: Learning to Generate LoRA for the Editing Instruction from A Single Before-After Image Pair [116.48684498656871]
We propose the LoRA of Change (LoC) framework for image editing with visual instructions, i.e., before-after image pairs.
We learn an instruction-specific LoRA to encode the "change" in a before-after image pair, enhancing the interpretability and reusability of our model.
Our model produces high-quality images that align with user intent and support a broad spectrum of real-world visual instructions.
arXiv Detail & Related papers (2024-11-28T13:55:06Z) - LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks [73.09643674975591]
Low-Rank Adaptation (LoRA) is a technique for parameter-efficient fine-tuning of Large Language Models (LLMs)
We study how different LoRA modules can be merged to achieve skill composition.
arXiv Detail & Related papers (2024-10-16T20:33:06Z) - AutoLoRA: AutoGuidance Meets Low-Rank Adaptation for Diffusion Models [0.9514837871243403]
Low-rank adaptation (LoRA) is a fine-tuning technique that can be applied to conditional generative diffusion models.
We introduce AutoLoRA, a novel guidance technique for diffusion models fine-tuned with the LoRA approach.
arXiv Detail & Related papers (2024-10-04T21:57:11Z) - UIR-LoRA: Achieving Universal Image Restoration through Multiple Low-Rank Adaptation [50.27688690379488]
Existing unified methods treat multi-degradation image restoration as a multi-task learning problem.
We propose a universal image restoration framework based on multiple low-rank adapters (LoRA) from multi-domain transfer learning.
Our framework leverages the pre-trained generative model as the shared component for multi-degradation restoration and transfers it to specific degradation image restoration tasks.
arXiv Detail & Related papers (2024-09-30T11:16:56Z) - DiffLoRA: Generating Personalized Low-Rank Adaptation Weights with Diffusion [43.55179971287028]
We propose DiffLoRA, an efficient method that leverages the diffusion model as a hypernetwork to predict personalized Low-Rank Adaptation weights.
By incorporating these LoRA weights into the off-the-shelf text-to-image model, DiffLoRA enables zero-shot personalization during inference.
We introduce a novel identity-oriented LoRA weights construction pipeline to facilitate the training process of DiffLoRA.
arXiv Detail & Related papers (2024-08-13T09:00:35Z) - Mixture of LoRA Experts [87.50120181861362]
This paper introduces the Mixture of LoRA Experts (MoLE) approach, which harnesses hierarchical control and unfettered branch selection.
The MoLE approach achieves superior LoRA fusion performance in comparison to direct arithmetic merging.
arXiv Detail & Related papers (2024-04-21T11:59:53Z) - Implicit Style-Content Separation using B-LoRA [61.664293840163865]
We introduce B-LoRA, a method that implicitly separate the style and content components of a single image.
By analyzing the architecture of SDXL combined with LoRA, we find that jointly learning the LoRA weights of two specific blocks achieves style-content separation.
arXiv Detail & Related papers (2024-03-21T17:20:21Z) - LoRA-Composer: Leveraging Low-Rank Adaptation for Multi-Concept Customization in Training-Free Diffusion Models [33.379758040084894]
Multi-concept customization emerges as the challenging task within this domain.
Existing approaches often rely on training a fusion matrix of multiple Low-Rank Adaptations (LoRAs) to merge various concepts into a single image.
LoRA-Composer is a training-free framework designed for seamlessly integrating multiple LoRAs.
arXiv Detail & Related papers (2024-03-18T09:58:52Z) - Multi-LoRA Composition for Image Generation [107.83002438126832]
We study multi-LoRA composition through a decoding-centric perspective.
We present two training-free methods: LoRA Switch, which alternates between different LoRAs at each denoising step, and LoRA Composite, which simultaneously incorporates all LoRAs to guide more cohesive image synthesis.
arXiv Detail & Related papers (2024-02-26T18:59:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.