TransLinkGuard: Safeguarding Transformer Models Against Model Stealing in Edge Deployment
- URL: http://arxiv.org/abs/2404.11121v1
- Date: Wed, 17 Apr 2024 07:08:45 GMT
- Title: TransLinkGuard: Safeguarding Transformer Models Against Model Stealing in Edge Deployment
- Authors: Qinfeng Li, Zhiqiang Shen, Zhenghan Qin, Yangfan Xie, Xuhong Zhang, Tianyu Du, Jianwei Yin,
- Abstract summary: We propose TransLinkGuard, a plug-and-play model protection approach against model stealing on edge devices.
The core part of TransLinkGuard is a lightweight authorization module residing in a secure environment.
Extensive experiments show that TransLinkGuard achieves the same security protection as the black-box security guarantees with negligible overhead.
- Score: 34.8682729537795
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Proprietary large language models (LLMs) have been widely applied in various scenarios. Additionally, deploying LLMs on edge devices is trending for efficiency and privacy reasons. However, edge deployment of proprietary LLMs introduces new security challenges: edge-deployed models are exposed as white-box accessible to users, enabling adversaries to conduct effective model stealing (MS) attacks. Unfortunately, existing defense mechanisms fail to provide effective protection. Specifically, we identify four critical protection properties that existing methods fail to simultaneously satisfy: (1) maintaining protection after a model is physically copied; (2) authorizing model access at request level; (3) safeguarding runtime reverse engineering; (4) achieving high security with negligible runtime overhead. To address the above issues, we propose TransLinkGuard, a plug-and-play model protection approach against model stealing on edge devices. The core part of TransLinkGuard is a lightweight authorization module residing in a secure environment, e.g., TEE. The authorization module can freshly authorize each request based on its input. Extensive experiments show that TransLinkGuard achieves the same security protection as the black-box security guarantees with negligible overhead.
Related papers
- Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training [67.30423823744506]
This study addresses a critical gap in safety tuning practices for Large Language Models (LLMs)
We introduce a novel approach, Decoupled Refusal Training (DeRTa), designed to empower LLMs to refuse compliance to harmful prompts at any response position.
DeRTa incorporates two novel components: (1) Maximum Likelihood Estimation with Harmful Response Prefix, which trains models to recognize and avoid unsafe content by appending a segment of harmful response to the beginning of a safe response, and (2) Reinforced Transition Optimization (RTO), which equips models with the ability to transition from potential harm to safety refusal consistently throughout the harmful
arXiv Detail & Related papers (2024-07-12T09:36:33Z) - LoRA-Guard: Parameter-Efficient Guardrail Adaptation for Content Moderation of Large Language Models [15.900125475191958]
Guardrails have emerged as an alternative to safety alignment for content moderation of large language models (LLMs)
We introduce LoRA-Guard, a parameter-efficient guardrail adaptation method that relies on knowledge sharing between LLMs and guardrail models.
We show that LoRA-Guard outperforms existing approaches with 100-1000x lower parameter overhead while maintaining accuracy, enabling on-device content moderation.
arXiv Detail & Related papers (2024-07-03T10:38:40Z) - SafeAligner: Safety Alignment against Jailbreak Attacks via Response Disparity Guidance [48.80398992974831]
SafeAligner is a methodology implemented at the decoding stage to fortify defenses against jailbreak attacks.
We develop two specialized models: the Sentinel Model, which is trained to foster safety, and the Intruder Model, designed to generate riskier responses.
We show that SafeAligner can increase the likelihood of beneficial tokens, while reducing the occurrence of harmful ones.
arXiv Detail & Related papers (2024-06-26T07:15:44Z) - Defensive Prompt Patch: A Robust and Interpretable Defense of LLMs against Jailbreak Attacks [59.46556573924901]
This paper introduces Defensive Prompt Patch (DPP), a novel prompt-based defense mechanism for large language models (LLMs)
Unlike previous approaches, DPP is designed to achieve a minimal Attack Success Rate (ASR) while preserving the high utility of LLMs.
Empirical results conducted on LLAMA-2-7B-Chat and Mistral-7B-Instruct-v0.2 models demonstrate the robustness and adaptability of DPP.
arXiv Detail & Related papers (2024-05-30T14:40:35Z) - PRP: Propagating Universal Perturbations to Attack Large Language Model
Guard-Rails [26.090757124460552]
Large language models (LLMs) are typically aligned to be harmless to humans.
Recent work has shown that such models are susceptible to automated jailbreak attacks that induce them to generate harmful content.
Our key contribution is to show a novel attack strategy, PRP, that is successful against several open-source (e.g., Llama 2) and closed-source (e.g., GPT 3.5) implementations of Guard Models.
arXiv Detail & Related papers (2024-02-24T21:27:13Z) - AdaptGuard: Defending Against Universal Attacks for Model Adaptation [129.2012687550069]
We study the vulnerability to universal attacks transferred from the source domain during model adaptation algorithms.
We propose a model preprocessing framework, named AdaptGuard, to improve the security of model adaptation algorithms.
arXiv Detail & Related papers (2023-03-19T07:53:31Z) - CrowdGuard: Federated Backdoor Detection in Federated Learning [39.58317527488534]
This paper presents a novel defense mechanism, CrowdGuard, that effectively mitigates backdoor attacks in Federated Learning.
CrowdGuard employs a server-located stacked clustering scheme to enhance its resilience to rogue client feedback.
The evaluation results demonstrate that CrowdGuard achieves a 100% True-Positive-Rate and True-Negative-Rate across various scenarios.
arXiv Detail & Related papers (2022-10-14T11:27:49Z) - Protecting Semantic Segmentation Models by Using Block-wise Image
Encryption with Secret Key from Unauthorized Access [13.106063755117399]
We propose to protect semantic segmentation models from unauthorized access by utilizing block-wise transformation with a secret key.
Experiment results show that the proposed protection method allows rightful users with the correct key to access the model to full capacity and deteriorate the performance for unauthorized users.
arXiv Detail & Related papers (2021-07-20T09:31:15Z) - Passport-aware Normalization for Deep Model Protection [122.61289882357022]
We propose a new passport-aware normalization formulation for deep learning models.
It only needs to add another passport-aware branch for IP protection.
It is demonstrated to be robust not only to common attack techniques like fine-tuning and model compression, but also to ambiguity attacks.
arXiv Detail & Related papers (2020-10-29T17:57:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.