Related papers: CLoRA: A Contrastive Approach to Compose Multiple LoRA Models

CLoRA: A Contrastive Approach to Compose Multiple LoRA Models

URL: http://arxiv.org/abs/2403.19776v1
Date: Thu, 28 Mar 2024 18:58:43 GMT
Title: CLoRA: A Contrastive Approach to Compose Multiple LoRA Models
Authors: Tuna Han Salih Meral, Enis Simsar, Federico Tombari, Pinar Yanardag,
Abstract summary: Low-Rank Adaptations (LoRAs) have emerged as a powerful and popular technique in the field of image generation. CLoRA addresses the problem of seamlessly blending multiple concept LoRAs to capture a variety of concepts in one image. Our method enables the creation of composite images that truly reflect the characteristics of each LoRA.
Score: 44.037664077117945
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Low-Rank Adaptations (LoRAs) have emerged as a powerful and popular technique in the field of image generation, offering a highly effective way to adapt and refine pre-trained deep learning models for specific tasks without the need for comprehensive retraining. By employing pre-trained LoRA models, such as those representing a specific cat and a particular dog, the objective is to generate an image that faithfully embodies both animals as defined by the LoRAs. However, the task of seamlessly blending multiple concept LoRAs to capture a variety of concepts in one image proves to be a significant challenge. Common approaches often fall short, primarily because the attention mechanisms within different LoRA models overlap, leading to scenarios where one concept may be completely ignored (e.g., omitting the dog) or where concepts are incorrectly combined (e.g., producing an image of two cats instead of one cat and one dog). To overcome these issues, CLoRA addresses them by updating the attention maps of multiple LoRA models and leveraging them to create semantic masks that facilitate the fusion of latent representations. Our method enables the creation of composite images that truly reflect the characteristics of each LoRA, successfully merging multiple concepts or styles. Our comprehensive evaluations, both qualitative and quantitative, demonstrate that our approach outperforms existing methodologies, marking a significant advancement in the field of image generation with LoRAs. Furthermore, we share our source code, benchmark dataset, and trained LoRA models to promote further research on this topic.

Related papers

Cached Multi-Lora Composition for Multi-Concept Image Generation [10.433033595844442]
Low-Rank Adaptation (LoRA) has emerged as a widely adopted technique in text-to-image models. Current approaches face significant challenges when composing these LoRAs for multi-concept image generation. We introduce a novel, training-free framework, Cached Multi-LoRA (CMLoRA), designed to efficiently integrate multiple LoRAs.
arXiv Detail & Related papers (2025-02-07T13:41:51Z)
A LoRA is Worth a Thousand Pictures [28.928964530616593]
Low Rank Adaptation (LoRA) can replicate an artist's style or subject using minimal data and computation. We show that LoRA weights alone can serve as an effective descriptor of style, without the need for additional image generation or knowledge of the original training set. We conclude with a discussion on potential future applications, such as zero-shot LoRA fine-tuning and model attribution.
arXiv Detail & Related papers (2024-12-16T18:18:17Z)
LoRACLR: Contrastive Adaptation for Customization of Diffusion Models [62.70911549650579]
LoRACLR is a novel approach for multi-concept image generation that merges multiple LoRA models, each fine-tuned for a distinct concept, into a single, unified model. LoRACLR uses a contrastive objective to align and merge the weight spaces of these models, ensuring compatibility while minimizing interference. Our results highlight the effectiveness of LoRACLR in accurately merging multiple concepts, advancing the capabilities of personalized image generation.
arXiv Detail & Related papers (2024-12-12T18:59:55Z)
LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style Conditioned Image Generation [28.098287135605364]
We introduce LoRA.rar, a method that improves image quality and achieves a remarkable speedup of over $4000times$ in the merging process. LoRA.rar pre-trains a hypernetwork on a diverse set of content-style LoRA pairs, learning an efficient merging strategy that generalizes to new, unseen content-style pairs. Our method significantly outperforms the current state of the art in both content and style fidelity, as validated by MLLM assessments and human evaluations.
arXiv Detail & Related papers (2024-12-06T16:04:56Z)
LoRA of Change: Learning to Generate LoRA for the Editing Instruction from A Single Before-After Image Pair [116.48684498656871]
We propose the LoRA of Change (LoC) framework for image editing with visual instructions, i.e., before-after image pairs. We learn an instruction-specific LoRA to encode the "change" in a before-after image pair, enhancing the interpretability and reusability of our model. Our model produces high-quality images that align with user intent and support a broad spectrum of real-world visual instructions.
arXiv Detail & Related papers (2024-11-28T13:55:06Z)
LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks [73.09643674975591]
Low-Rank Adaptation (LoRA) is a technique for parameter-efficient fine-tuning of Large Language Models (LLMs) We study how different LoRA modules can be merged to achieve skill composition.
arXiv Detail & Related papers (2024-10-16T20:33:06Z)
AutoLoRA: AutoGuidance Meets Low-Rank Adaptation for Diffusion Models [0.9514837871243403]
Low-rank adaptation (LoRA) is a fine-tuning technique that can be applied to conditional generative diffusion models. We introduce AutoLoRA, a novel guidance technique for diffusion models fine-tuned with the LoRA approach.
arXiv Detail & Related papers (2024-10-04T21:57:11Z)
UIR-LoRA: Achieving Universal Image Restoration through Multiple Low-Rank Adaptation [50.27688690379488]
Existing unified methods treat multi-degradation image restoration as a multi-task learning problem. We propose a universal image restoration framework based on multiple low-rank adapters (LoRA) from multi-domain transfer learning. Our framework leverages the pre-trained generative model as the shared component for multi-degradation restoration and transfers it to specific degradation image restoration tasks.
arXiv Detail & Related papers (2024-09-30T11:16:56Z)
DiffLoRA: Generating Personalized Low-Rank Adaptation Weights with Diffusion [43.55179971287028]
We propose DiffLoRA, an efficient method that leverages the diffusion model as a hypernetwork to predict personalized Low-Rank Adaptation weights. By incorporating these LoRA weights into the off-the-shelf text-to-image model, DiffLoRA enables zero-shot personalization during inference. We introduce a novel identity-oriented LoRA weights construction pipeline to facilitate the training process of DiffLoRA.
arXiv Detail & Related papers (2024-08-13T09:00:35Z)
Mixture of LoRA Experts [87.50120181861362]
This paper introduces the Mixture of LoRA Experts (MoLE) approach, which harnesses hierarchical control and unfettered branch selection. The MoLE approach achieves superior LoRA fusion performance in comparison to direct arithmetic merging.
arXiv Detail & Related papers (2024-04-21T11:59:53Z)
Implicit Style-Content Separation using B-LoRA [61.664293840163865]
We introduce B-LoRA, a method that implicitly separate the style and content components of a single image. By analyzing the architecture of SDXL combined with LoRA, we find that jointly learning the LoRA weights of two specific blocks achieves style-content separation.
arXiv Detail & Related papers (2024-03-21T17:20:21Z)
LoRA-Composer: Leveraging Low-Rank Adaptation for Multi-Concept Customization in Training-Free Diffusion Models [33.379758040084894]
Multi-concept customization emerges as the challenging task within this domain. Existing approaches often rely on training a fusion matrix of multiple Low-Rank Adaptations (LoRAs) to merge various concepts into a single image. LoRA-Composer is a training-free framework designed for seamlessly integrating multiple LoRAs.
arXiv Detail & Related papers (2024-03-18T09:58:52Z)
Multi-LoRA Composition for Image Generation [107.83002438126832]
We study multi-LoRA composition through a decoding-centric perspective. We present two training-free methods: LoRA Switch, which alternates between different LoRAs at each denoising step, and LoRA Composite, which simultaneously incorporates all LoRAs to guide more cohesive image synthesis.
arXiv Detail & Related papers (2024-02-26T18:59:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.