LoRA-as-an-Attack! Piercing LLM Safety Under The Share-and-Play Scenario
- URL: http://arxiv.org/abs/2403.00108v1
- Date: Thu, 29 Feb 2024 20:25:16 GMT
- Title: LoRA-as-an-Attack! Piercing LLM Safety Under The Share-and-Play Scenario
- Authors: Hongyi Liu, Zirui Liu, Ruixiang Tang, Jiayi Yuan, Shaochen Zhong,
Yu-Neng Chuang, Li Li, Rui Chen, Xia Hu
- Abstract summary: We study how to inject backdoor into the LoRA module and dive deeper into LoRA's infection mechanisms.
Our aim is to raise awareness of the potential risks under the emerging share-and-play scenario, so as to proactively prevent potential consequences caused by LoRA-as-an-Attack.
- Score: 61.99243609126672
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fine-tuning LLMs is crucial to enhancing their task-specific performance and
ensuring model behaviors are aligned with human preferences. Among various
fine-tuning methods, LoRA is popular for its efficiency and ease to use,
allowing end-users to easily post and adopt lightweight LoRA modules on
open-source platforms to tailor their model for different customization.
However, such a handy share-and-play setting opens up new attack surfaces, that
the attacker can render LoRA as an attacker, such as backdoor injection, and
widely distribute the adversarial LoRA to the community easily. This can result
in detrimental outcomes. Despite the huge potential risks of sharing LoRA
modules, this aspect however has not been fully explored. To fill the gap, in
this study we thoroughly investigate the attack opportunities enabled in the
growing share-and-play scenario. Specifically, we study how to inject backdoor
into the LoRA module and dive deeper into LoRA's infection mechanisms. We found
that training-free mechanism is possible in LoRA backdoor injection. We also
discover the impact of backdoor attacks with the presence of multiple LoRA
adaptions concurrently as well as LoRA based backdoor transferability. Our aim
is to raise awareness of the potential risks under the emerging share-and-play
scenario, so as to proactively prevent potential consequences caused by
LoRA-as-an-Attack. Warning: the paper contains potential offensive content
generated by models.
Related papers
- Learning Attentional Mixture of LoRAs for Language Model Continual Learning [5.405488709294211]
Fine-tuning large language models (LLMs) with Low-Rank adaption (LoRA) is widely acknowledged as an effective approach for continual learning for new tasks.
We propose Attentional Mixture of LoRAs (AM-LoRA), a continual learning approach tailored for LLMs.
arXiv Detail & Related papers (2024-09-29T08:34:54Z) - Retrieval-Augmented Mixture of LoRA Experts for Uploadable Machine Learning [57.36978335727009]
Low-Rank Adaptation (LoRA) offers an efficient way to fine-tune large language models (LLMs)
In this paper, we propose a framework that adaptively retrieves and composes multiple LoRAs based on input prompts.
arXiv Detail & Related papers (2024-06-24T05:24:41Z) - Compress then Serve: Serving Thousands of LoRA Adapters with Little Overhead [41.31302904190149]
Fine-tuning large language models with low-rank adaptations (LoRAs) has become common practice.
We propose a method for joint compression of LoRAs into a shared basis paired with LoRA-specific scaling matrices.
Experiments with up to 500 LoRAs demonstrate that compressed LoRAs preserve performance while offering major throughput gains.
arXiv Detail & Related papers (2024-06-17T15:21:35Z) - Mixture of LoRA Experts [87.50120181861362]
This paper introduces the Mixture of LoRA Experts (MoLE) approach, which harnesses hierarchical control and unfettered branch selection.
The MoLE approach achieves superior LoRA fusion performance in comparison to direct arithmetic merging.
arXiv Detail & Related papers (2024-04-21T11:59:53Z) - Continual Forgetting for Pre-trained Vision Models [70.51165239179052]
In real-world scenarios, selective information is expected to be continuously removed from a pre-trained model.
We propose Group Sparse LoRA (GS-LoRA) for efficient and effective deleting.
We conduct extensive experiments on face recognition, object detection and image classification and demonstrate that GS-LoRA manages to forget specific classes with minimal impact on other classes.
arXiv Detail & Related papers (2024-03-18T07:33:56Z) - Privacy-Preserving Low-Rank Adaptation for Latent Diffusion Models [18.472894244598503]
Low-rank adaptation (LoRA) is an efficient strategy for adapting latent diffusion models (LDMs) on a private dataset to generate specific images.
We propose a solution: Membership-Privacy-preserving LoRA (MP-LoRA)
We show that MP-LoRA has the issue of unstable optimization, and theoretically analyze that the potential reason is the unconstrained local smoothness.
Our experimental results corroborate that SMP-LoRA can indeed defend against MI attacks and generate high-quality images.
arXiv Detail & Related papers (2024-02-19T09:32:48Z) - LoraRetriever: Input-Aware LoRA Retrieval and Composition for Mixed
Tasks in the Wild [76.67343971195267]
Low-Rank Adaptation (LoRA) provides an efficient solution for fine-tuning large language models (LLM)
LoraRetriever is a retrieve-then-compose framework that adaptively retrieves and composes multiple LoRAs according to the input prompts.
Experimental results indicate that LoraRetriever consistently outperforms the baselines.
arXiv Detail & Related papers (2024-02-15T15:02:46Z) - CA-LoRA: Adapting Existing LoRA for Compressed LLMs to Enable Efficient Multi-Tasking on Personal Devices [78.16679232748196]
We introduce a Compression-Aware LoRA (CA-LoRA) framework to transfer Large Language Models (LLMs) to other tasks.
Experiment results demonstrate that CA-LoRA outperforms the vanilla LoRA methods applied to a compressed LLM.
The source code of CA-LoRA is available at https://github.com/thunlp/CA-LoRA.
arXiv Detail & Related papers (2023-07-15T04:37:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.