When LoRA Betrays: Backdooring Text-to-Image Models by Masquerading as Benign Adapters
- URL: http://arxiv.org/abs/2602.21977v2
- Date: Thu, 05 Mar 2026 14:54:52 GMT
- Title: When LoRA Betrays: Backdooring Text-to-Image Models by Masquerading as Benign Adapters
- Authors: Liangwei Lyu, Jiaqi Xu, Jianwei Ding, Qiyao Deng,
- Abstract summary: Low-Rank Adaptation (LoRA) has emerged as a leading technique for efficiently fine-tuning text-to-image diffusion models.<n>MasqLoRA is the first systematic attack framework that leverages an independent LoRA module as the attack vehicle.<n>MasqLoRA achieves a high attack success rate of 99.8%.
- Score: 10.859491015719088
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Low-Rank Adaptation (LoRA) has emerged as a leading technique for efficiently fine-tuning text-to-image diffusion models, and its widespread adoption on open-source platforms has fostered a vibrant culture of model sharing and customization. However, the same modular and plug-and-play flexibility that makes LoRA appealing also introduces a broader attack surface. To highlight this risk, we propose Masquerade-LoRA (MasqLoRA), the first systematic attack framework that leverages an independent LoRA module as the attack vehicle to stealthily inject malicious behavior into text-to-image diffusion models. MasqLoRA operates by freezing the base model parameters and updating only the low-rank adapter weights using a small number of "trigger word-target image" pairs. This enables the attacker to train a standalone backdoor LoRA module that embeds a hidden cross-modal mapping: when the module is loaded and a specific textual trigger is provided, the model produces a predefined visual output; otherwise, it behaves indistinguishably from the benign model, ensuring the stealthiness of the attack. Experimental results demonstrate that MasqLoRA can be trained with minimal resource overhead and achieves a high attack success rate of 99.8%. MasqLoRA reveals a severe and unique threat in the AI supply chain, underscoring the urgent need for dedicated defense mechanisms for the LoRA-centric sharing ecosystem.
Related papers
- AuthenLoRA: Entangling Stylization with Imperceptible Watermarks for Copyright-Secure LoRA Adapters [52.556959321030966]
Low-Rank Adaptation (LoRA) offers an efficient paradigm for customizing diffusion models.<n>Existing watermarking techniques either target base models or verify LoRA modules themselves.<n>We propose AuthenLoRA, a unified watermarking framework that embeds imperceptible, traceable watermarks directly into the LoRA training process.
arXiv Detail & Related papers (2025-11-26T09:48:11Z) - StolenLoRA: Exploring LoRA Extraction Attacks via Synthetic Data [39.230850434780756]
This paper introduces a new focus of model extraction attacks named LoRA extraction.<n>We propose a novel extraction method called StolenLoRA which trains a substitute model to extract the functionality of a LoRA-adapted model.<n>Our experiments demonstrate the effectiveness of StolenLoRA, achieving up to a 96.60% attack success rate with only 10k queries.
arXiv Detail & Related papers (2025-09-28T02:51:35Z) - LoRAtorio: An intrinsic approach to LoRA Skill Composition [11.429106388558925]
Low-Rank Adaptation (LoRA) has become a widely adopted technique in text-to-image diffusion models.<n>Existing approaches struggle to effectively compose multiple LoRA adapters.<n>We present LoRAtorio, a novel train-free framework for multi-LoRA composition.
arXiv Detail & Related papers (2025-08-15T17:52:56Z) - LoRAShield: Data-Free Editing Alignment for Secure Personalized LoRA Sharing [43.88211522311429]
Low-Rank Adaptation (LoRA) models can be shared on platforms like Civitai and Liblib.<n>LoRAShield is the first data-free editing framework for securing LoRA models against misuse.
arXiv Detail & Related papers (2025-07-05T02:53:17Z) - Merger-as-a-Stealer: Stealing Targeted PII from Aligned LLMs with Model Merging [49.270050440553575]
We propose textttMerger-as-a-Stealer, a two-stage framework to achieve this attack.<n>First, the attacker fine-tunes a malicious model to force it to respond to any PII-related queries.<n>Second, the attacker inputs direct PII-related queries to the merged model to extract targeted PII.
arXiv Detail & Related papers (2025-02-22T05:34:53Z) - LoRA-X: Bridging Foundation Models with Training-Free Cross-Model Adaptation [48.22550575107633]
A new adapter, Cross-Model Low-Rank Adaptation (LoRA-X), enables the training-free transfer of LoRA parameters across source and target models.<n>Our experiments demonstrate the effectiveness of LoRA-X for text-to-image generation.
arXiv Detail & Related papers (2025-01-27T23:02:24Z) - LoBAM: LoRA-Based Backdoor Attack on Model Merging [27.57659381949931]
Model merging is an emerging technique that integrates multiple models fine-tuned on different tasks to create a versatile model that excels in multiple domains.<n>Existing works try to demonstrate the risk of such attacks by assuming substantial computational resources.<n>We propose LoBAM, a method that yields high attack success rate with minimal training resources.
arXiv Detail & Related papers (2024-11-23T20:41:24Z) - Mixture of LoRA Experts [87.50120181861362]
This paper introduces the Mixture of LoRA Experts (MoLE) approach, which harnesses hierarchical control and unfettered branch selection.
The MoLE approach achieves superior LoRA fusion performance in comparison to direct arithmetic merging.
arXiv Detail & Related papers (2024-04-21T11:59:53Z) - LoRATK: LoRA Once, Backdoor Everywhere in the Share-and-Play Ecosystem [55.2986934528672]
We study how backdoors can be injected into task-enhancing LoRAs.<n>We find that with a simple, efficient, yet specific recipe, a backdoor LoRA can be trained once and then seamlessly merged with multiple LoRAs.<n>Our work is among the first to study this new threat model of training-free distribution of downstream-capable-yet-backdoor-injected LoRAs.
arXiv Detail & Related papers (2024-02-29T20:25:16Z) - The Expressive Power of Low-Rank Adaptation [11.371811534310078]
Low-Rank Adaptation, a parameter-efficient fine-tuning method, has emerged as a prevalent technique for fine-tuning pre-trained models.
This paper takes the first step to bridge the gap by theoretically analyzing the expressive power of LoRA.
For Transformer networks, we show any model can be adapted to a target model of the same size with rank-$(fractextembedding size2)$ LoRA.
arXiv Detail & Related papers (2023-10-26T16:08:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.