LoRA-as-an-Attack! Piercing LLM Safety Under The Share-and-Play Scenario
- URL: http://arxiv.org/abs/2403.00108v1
- Date: Thu, 29 Feb 2024 20:25:16 GMT
- Title: LoRA-as-an-Attack! Piercing LLM Safety Under The Share-and-Play Scenario
- Authors: Hongyi Liu, Zirui Liu, Ruixiang Tang, Jiayi Yuan, Shaochen Zhong,
Yu-Neng Chuang, Li Li, Rui Chen, Xia Hu
- Abstract summary: We study how to inject backdoor into the LoRA module and dive deeper into LoRA's infection mechanisms.
Our aim is to raise awareness of the potential risks under the emerging share-and-play scenario, so as to proactively prevent potential consequences caused by LoRA-as-an-Attack.
- Score: 61.99243609126672
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fine-tuning LLMs is crucial to enhancing their task-specific performance and
ensuring model behaviors are aligned with human preferences. Among various
fine-tuning methods, LoRA is popular for its efficiency and ease to use,
allowing end-users to easily post and adopt lightweight LoRA modules on
open-source platforms to tailor their model for different customization.
However, such a handy share-and-play setting opens up new attack surfaces, that
the attacker can render LoRA as an attacker, such as backdoor injection, and
widely distribute the adversarial LoRA to the community easily. This can result
in detrimental outcomes. Despite the huge potential risks of sharing LoRA
modules, this aspect however has not been fully explored. To fill the gap, in
this study we thoroughly investigate the attack opportunities enabled in the
growing share-and-play scenario. Specifically, we study how to inject backdoor
into the LoRA module and dive deeper into LoRA's infection mechanisms. We found
that training-free mechanism is possible in LoRA backdoor injection. We also
discover the impact of backdoor attacks with the presence of multiple LoRA
adaptions concurrently as well as LoRA based backdoor transferability. Our aim
is to raise awareness of the potential risks under the emerging share-and-play
scenario, so as to proactively prevent potential consequences caused by
LoRA-as-an-Attack. Warning: the paper contains potential offensive content
generated by models.
Related papers
- Retrieval-Augmented Mixture of LoRA Experts for Uploadable Machine Learning [57.36978335727009]
Low-Rank Adaptation (LoRA) offers an efficient way to fine-tune large language models (LLMs)
In this paper, we propose a framework that adaptively retrieves and composes multiple LoRAs based on input prompts.
arXiv Detail & Related papers (2024-06-24T05:24:41Z) - Compress then Serve: Serving Thousands of LoRA Adapters with Little Overhead [41.31302904190149]
Fine-tuning large language models with low-rank adapters (LoRAs) has become common practice, often yielding numerous copies of the same LLM differing only in their LoRA updates.
This paradigm presents challenges for systems that serve real-time responses to queries that each involve a different LoRA.
We consider compressing adapters individually via SVD and propose a method for joint compression of LoRAs into a shared basis paired with LoRA-specific scaling matrices.
arXiv Detail & Related papers (2024-06-17T15:21:35Z) - Mixture of LoRA Experts [87.50120181861362]
This paper introduces the Mixture of LoRA Experts (MoLE) approach, which harnesses hierarchical control and unfettered branch selection.
The MoLE approach achieves superior LoRA fusion performance in comparison to direct arithmetic merging.
arXiv Detail & Related papers (2024-04-21T11:59:53Z) - Continual Forgetting for Pre-trained Vision Models [70.51165239179052]
In real-world scenarios, selective information is expected to be continuously removed from a pre-trained model.
We propose Group Sparse LoRA (GS-LoRA) for efficient and effective deleting.
We conduct extensive experiments on face recognition, object detection and image classification and demonstrate that GS-LoRA manages to forget specific classes with minimal impact on other classes.
arXiv Detail & Related papers (2024-03-18T07:33:56Z) - Privacy-Preserving Low-Rank Adaptation for Latent Diffusion Models [18.472894244598503]
Low-rank adaptation (LoRA) is an efficient strategy for adapting latent diffusion models (LDMs) on a private dataset to generate specific images.
We propose a solution: Membership-Privacy-preserving LoRA (MP-LoRA)
We show that MP-LoRA has the issue of unstable optimization, and theoretically analyze that the potential reason is the unconstrained local smoothness.
Our experimental results corroborate that SMP-LoRA can indeed defend against MI attacks and generate high-quality images.
arXiv Detail & Related papers (2024-02-19T09:32:48Z) - LoRA-Flow: Dynamic LoRA Fusion for Large Language Models in Generative
Tasks [72.88244322513039]
LoRA employs lightweight modules to customize large language models (LLMs) for each downstream task or domain.
We propose LoRA-Flow, which utilizes dynamic weights to adjust the impact of different LoRAs.
Experiments across six generative tasks demonstrate that our method consistently outperforms baselines with task-level fusion weights.
arXiv Detail & Related papers (2024-02-18T04:41:25Z) - LoraRetriever: Input-Aware LoRA Retrieval and Composition for Mixed
Tasks in the Wild [76.67343971195267]
Low-Rank Adaptation (LoRA) provides an efficient solution for fine-tuning large language models (LLM)
LoraRetriever is a retrieve-then-compose framework that adaptively retrieves and composes multiple LoRAs according to the input prompts.
Experimental results indicate that LoraRetriever consistently outperforms the baselines.
arXiv Detail & Related papers (2024-02-15T15:02:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.