Related papers: LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style Conditioned Image Generation

LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style Conditioned Image Generation

URL: http://arxiv.org/abs/2412.05148v1
Date: Fri, 06 Dec 2024 16:04:56 GMT
Title: LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style Conditioned Image Generation
Authors: Donald Shenaj, Ondrej Bohdal, Mete Ozay, Pietro Zanuttigh, Umberto Michieli,
Abstract summary: We introduce LoRA.rar, a method that improves image quality and achieves a remarkable speedup of over $4000times$ in the merging process.<n>LoRA.rar pre-trains a hypernetwork on a diverse set of content-style LoRA pairs, learning an efficient merging strategy that generalizes to new, unseen content-style pairs.<n>Our method significantly outperforms the current state of the art in both content and style fidelity, as validated by MLLM assessments and human evaluations.
Score: 28.098287135605364
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Recent advancements in image generation models have enabled personalized image creation with both user-defined subjects (content) and styles. Prior works achieved personalization by merging corresponding low-rank adaptation parameters (LoRAs) through optimization-based methods, which are computationally demanding and unsuitable for real-time use on resource-constrained devices like smartphones. To address this, we introduce LoRA.rar, a method that not only improves image quality but also achieves a remarkable speedup of over $4000\times$ in the merging process. LoRA.rar pre-trains a hypernetwork on a diverse set of content-style LoRA pairs, learning an efficient merging strategy that generalizes to new, unseen content-style pairs, enabling fast, high-quality personalization. Moreover, we identify limitations in existing evaluation metrics for content-style quality and propose a new protocol using multimodal large language models (MLLM) for more accurate assessment. Our method significantly outperforms the current state of the art in both content and style fidelity, as validated by MLLM assessments and human evaluations.

Related papers

DRC: Enhancing Personalized Image Generation via Disentangled Representation Composition [69.10628479553709]
We introduce DRC, a novel personalized image generation framework that enhances Large Multimodal Models (LMMs) DRC explicitly extracts user style preferences and semantic intentions from history images and the reference image, respectively. It involves two critical learning stages: 1) Disentanglement learning, which employs a dual-tower disentangler to explicitly separate style and semantic features, optimized via a reconstruction-driven paradigm with difficulty-aware importance sampling; and 2) Personalized modeling, which applies semantic-preserving augmentations to effectively adapt the disentangled representations for robust personalized generation.
arXiv Detail & Related papers (2025-04-24T08:10:10Z)
AC-LoRA: Auto Component LoRA for Personalized Artistic Style Image Generation [2.2820583483778045]
AC-LoRA is able to automatically separate the signal component and noise component of the LoRA matrices for fast and efficient personalized artistic style image generation. Results were validated using FID, CLIP, DINO, and ImageReward, achieving an average of 9% improvement.
arXiv Detail & Related papers (2025-04-03T02:56:01Z)
Meta-LoRA: Meta-Learning LoRA Components for Domain-Aware ID Personalization [5.874782446136915]
We introduce Meta-LoRA, a framework that separates identity-agnostic knowledge from identity-specific adaptation. Our results demonstrate that Meta-LoRA achieves superior identity retention, computational efficiency, and adaptability across diverse identity conditions.
arXiv Detail & Related papers (2025-03-28T11:47:33Z)
ConsisLoRA: Enhancing Content and Style Consistency for LoRA-based Style Transfer [20.088714830700916]
Style transfer involves transferring the style from a reference image to the content of a target image. Recent advancements in LoRA-based (Low-Rank Adaptation) methods have shown promise in effectively capturing the style of a single image. These approaches still face significant challenges such as content inconsistency, style misalignment, and content leakage.
arXiv Detail & Related papers (2025-03-13T17:55:58Z)
Cached Multi-Lora Composition for Multi-Concept Image Generation [10.433033595844442]
Low-Rank Adaptation (LoRA) has emerged as a widely adopted technique in text-to-image models. Current approaches face significant challenges when composing these LoRAs for multi-concept image generation. We introduce a novel, training-free framework, Cached Multi-LoRA (CMLoRA), designed to efficiently integrate multiple LoRAs.
arXiv Detail & Related papers (2025-02-07T13:41:51Z)
CTR-Driven Advertising Image Generation with Multimodal Large Language Models [53.40005544344148]
We explore the use of Multimodal Large Language Models (MLLMs) for generating advertising images by optimizing for Click-Through Rate (CTR) as the primary objective. To further improve the CTR of generated images, we propose a novel reward model to fine-tune pre-trained MLLMs through Reinforcement Learning (RL) Our method achieves state-of-the-art performance in both online and offline metrics.
arXiv Detail & Related papers (2025-02-05T09:06:02Z)
LoRACLR: Contrastive Adaptation for Customization of Diffusion Models [62.70911549650579]
LoRACLR is a novel approach for multi-concept image generation that merges multiple LoRA models, each fine-tuned for a distinct concept, into a single, unified model. LoRACLR uses a contrastive objective to align and merge the weight spaces of these models, ensuring compatibility while minimizing interference. Our results highlight the effectiveness of LoRACLR in accurately merging multiple concepts, advancing the capabilities of personalized image generation.
arXiv Detail & Related papers (2024-12-12T18:59:55Z)
LoRA of Change: Learning to Generate LoRA for the Editing Instruction from A Single Before-After Image Pair [116.48684498656871]
We propose the LoRA of Change (LoC) framework for image editing with visual instructions, i.e., before-after image pairs.<n>We learn an instruction-specific LoRA to encode the "change" in a before-after image pair, enhancing the interpretability and reusability of our model.<n>Our model produces high-quality images that align with user intent and support a broad spectrum of real-world visual instructions.
arXiv Detail & Related papers (2024-11-28T13:55:06Z)
IterIS: Iterative Inference-Solving Alignment for LoRA Merging [14.263218227928729]
Low-rank adaptations (LoRAs) are widely used to fine-tune large models across various domains for specific downstream tasks. LoRA merging presents an effective solution by combining multiple LoRAs into a unified adapter while maintaining data privacy.
arXiv Detail & Related papers (2024-11-21T19:04:02Z)
UIR-LoRA: Achieving Universal Image Restoration through Multiple Low-Rank Adaptation [50.27688690379488]
Existing unified methods treat multi-degradation image restoration as a multi-task learning problem. We propose a universal image restoration framework based on multiple low-rank adapters (LoRA) from multi-domain transfer learning. Our framework leverages the pre-trained generative model as the shared component for multi-degradation restoration and transfers it to specific degradation image restoration tasks.
arXiv Detail & Related papers (2024-09-30T11:16:56Z)
DiffLoRA: Generating Personalized Low-Rank Adaptation Weights with Diffusion [43.55179971287028]
We propose DiffLoRA, an efficient method that leverages the diffusion model as a hypernetwork to predict personalized Low-Rank Adaptation weights. By incorporating these LoRA weights into the off-the-shelf text-to-image model, DiffLoRA enables zero-shot personalization during inference. We introduce a novel identity-oriented LoRA weights construction pipeline to facilitate the training process of DiffLoRA.
arXiv Detail & Related papers (2024-08-13T09:00:35Z)
CLoRA: A Contrastive Approach to Compose Multiple LoRA Models [44.037664077117945]
Low-Rank Adaptations (LoRAs) have emerged as a powerful and popular technique in the field of image generation. CLoRA addresses the problem of seamlessly blending multiple concept LoRAs to capture a variety of concepts in one image. Our method enables the creation of composite images that truly reflect the characteristics of each LoRA.
arXiv Detail & Related papers (2024-03-28T18:58:43Z)
ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs [56.85106417530364]
Low-rank adaptations (LoRA) have been proposed as a parameter-efficient way of achieving concept-driven personalization. We propose ZipLoRA, a method to cheaply and effectively merge independently trained style and subject LoRAs. Experiments show that ZipLoRA can generate compelling results with meaningful improvements over baselines in subject and style fidelity.
arXiv Detail & Related papers (2023-11-22T18:59:36Z)
Continual Diffusion: Continual Customization of Text-to-Image Diffusion with C-LoRA [64.10981296843609]
We show that recent state-of-the-art customization of text-to-image models suffer from catastrophic forgetting when new concepts arrive sequentially. We propose a new method, C-LoRA, composed of a continually self-regularized low-rank adaptation in cross attention layers of the popular Stable Diffusion model. We show that C-LoRA not only outperforms several baselines for our proposed setting of text-to-image continual customization, but that we achieve a new state-of-the-art in the well-established rehearsal-free continual learning setting for image classification.
arXiv Detail & Related papers (2023-04-12T17:59:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.