Related papers: TransLinkGuard: Safeguarding Transformer Models Against Model Stealing in Edge Deployment

TransLinkGuard: Safeguarding Transformer Models Against Model Stealing in Edge Deployment

URL: http://arxiv.org/abs/2404.11121v1
Date: Wed, 17 Apr 2024 07:08:45 GMT
Title: TransLinkGuard: Safeguarding Transformer Models Against Model Stealing in Edge Deployment
Authors: Qinfeng Li, Zhiqiang Shen, Zhenghan Qin, Yangfan Xie, Xuhong Zhang, Tianyu Du, Jianwei Yin,
Abstract summary: We propose TransLinkGuard, a plug-and-play model protection approach against model stealing on edge devices. The core part of TransLinkGuard is a lightweight authorization module residing in a secure environment. Extensive experiments show that TransLinkGuard achieves the same security protection as the black-box security guarantees with negligible overhead.
Score: 34.8682729537795
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Proprietary large language models (LLMs) have been widely applied in various scenarios. Additionally, deploying LLMs on edge devices is trending for efficiency and privacy reasons. However, edge deployment of proprietary LLMs introduces new security challenges: edge-deployed models are exposed as white-box accessible to users, enabling adversaries to conduct effective model stealing (MS) attacks. Unfortunately, existing defense mechanisms fail to provide effective protection. Specifically, we identify four critical protection properties that existing methods fail to simultaneously satisfy: (1) maintaining protection after a model is physically copied; (2) authorizing model access at request level; (3) safeguarding runtime reverse engineering; (4) achieving high security with negligible runtime overhead. To address the above issues, we propose TransLinkGuard, a plug-and-play model protection approach against model stealing on edge devices. The core part of TransLinkGuard is a lightweight authorization module residing in a secure environment, e.g., TEE. The authorization module can freshly authorize each request based on its input. Extensive experiments show that TransLinkGuard achieves the same security protection as the black-box security guarantees with negligible overhead.

Related papers

Zero-Trust Artificial Intelligence Model Security Based on Moving Target Defense and Content Disarm and Reconstruction [4.0208298639821525]
This paper examines the challenges in distributing AI models through model zoos and file transfer mechanisms. The physical security of model files is critical, requiring stringent access controls and attack prevention solutions. It demonstrates a 100% disarm rate while validated against known AI model repositories and actual malware attacks from the HuggingFace model zoo.
arXiv Detail & Related papers (2025-03-03T17:32:19Z)
PersGuard: Preventing Malicious Personalization via Backdoor Attacks on Pre-trained Text-to-Image Diffusion Models [51.458089902581456]
We introduce PersGuard, a novel backdoor-based approach that prevents malicious personalization of specific images. Our method significantly outperforms existing techniques, offering a more robust solution for privacy and copyright protection.
arXiv Detail & Related papers (2025-02-22T09:47:55Z)
Exploiting Prefix-Tree in Structured Output Interfaces for Enhancing Jailbreak Attacking [34.479355499938116]
Large Language Models (LLMs) have led to significant applications but also introduced serious security threats. We introduce a black-box attack framework called AttackPrefixTree (APT) APT exploits structured output interfaces to dynamically construct attack patterns. Experiments on benchmark datasets indicate that this approach achieves higher attack success rate than existing methods.
arXiv Detail & Related papers (2025-02-19T08:29:36Z)
CoreGuard: Safeguarding Foundational Capabilities of LLMs Against Model Stealing in Edge Deployment [43.53211005936295]
CoreGuard is a computation- and communication-efficient model protection approach against model stealing on edge devices. We show that CoreGuard achieves the same security protection as the black-box security guarantees with negligible overhead.
arXiv Detail & Related papers (2024-10-16T08:14:24Z)
Position: On-Premises LLM Deployment Demands a Middle Path: Preserving Privacy Without Sacrificing Model Confidentiality [18.575663556525864]
We argue that deploying closed-source LLMs within user-controlled infrastructure enhances data privacy and mitigates misuse risks. A well-designed on-premises deployment must ensure model confidentiality -- by preventing model theft -- and offer privacy-preserving customization. Our findings demonstrate that privacy and confidentiality can coexist, paving the way for secure on-premises AI deployment.
arXiv Detail & Related papers (2024-10-15T02:00:36Z)
ASPIRER: Bypassing System Prompts With Permutation-based Backdoors in LLMs [17.853862145962292]
We introduce a novel backdoor attack that systematically bypasses system prompts. Our method achieves an attack success rate (ASR) of up to 99.50% while maintaining a clean accuracy (CACC) of 98.58%.
arXiv Detail & Related papers (2024-10-05T02:58:20Z)
Safeguard is a Double-edged Sword: Denial-of-service Attack on Large Language Models [7.013820690538764]
We present a new denial-of-service (DoS) attack on large language models (LLMs) By software or phishing attacks on user client software, attackers insert a short, seemingly innocuous adversarial prompt into to user prompt templates in configuration files. Our attack can automatically generate seemingly safe adversarial prompts, approximately only 30 characters long, that universally block over 97% of user requests on Llama Guard 3.
arXiv Detail & Related papers (2024-10-03T19:07:53Z)
Prefix Guidance: A Steering Wheel for Large Language Models to Defend Against Jailbreak Attacks [27.11523234556414]
We propose a plug-and-play and easy-to-deploy jailbreak defense framework, namely Prefix Guidance (PG) PG guides the model to identify harmful prompts by directly setting the first few tokens of the model's output. We demonstrate the effectiveness of PG across three models and five attack methods.
arXiv Detail & Related papers (2024-08-15T14:51:32Z)
Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training [67.30423823744506]
This study addresses a critical gap in safety tuning practices for Large Language Models (LLMs) We introduce a novel approach, Decoupled Refusal Training (DeRTa), designed to empower LLMs to refuse compliance to harmful prompts at any response position. DeRTa incorporates two novel components: (1) Maximum Likelihood Estimation with Harmful Response Prefix, which trains models to recognize and avoid unsafe content by appending a segment of harmful response to the beginning of a safe response, and (2) Reinforced Transition Optimization (RTO), which equips models with the ability to transition from potential harm to safety refusal consistently throughout the harmful
arXiv Detail & Related papers (2024-07-12T09:36:33Z)
Defensive Prompt Patch: A Robust and Interpretable Defense of LLMs against Jailbreak Attacks [59.46556573924901]
This paper introduces Defensive Prompt Patch (DPP), a novel prompt-based defense mechanism for large language models (LLMs) Unlike previous approaches, DPP is designed to achieve a minimal Attack Success Rate (ASR) while preserving the high utility of LLMs. Empirical results conducted on LLAMA-2-7B-Chat and Mistral-7B-Instruct-v0.2 models demonstrate the robustness and adaptability of DPP.
arXiv Detail & Related papers (2024-05-30T14:40:35Z)
ModelShield: Adaptive and Robust Watermark against Model Extraction Attack [58.46326901858431]
Large language models (LLMs) demonstrate general intelligence across a variety of machine learning tasks. adversaries can still utilize model extraction attacks to steal the model intelligence encoded in model generation. Watermarking technology offers a promising solution for defending against such attacks by embedding unique identifiers into the model-generated content.
arXiv Detail & Related papers (2024-05-03T06:41:48Z)
AdaptGuard: Defending Against Universal Attacks for Model Adaptation [129.2012687550069]
We study the vulnerability to universal attacks transferred from the source domain during model adaptation algorithms. We propose a model preprocessing framework, named AdaptGuard, to improve the security of model adaptation algorithms.
arXiv Detail & Related papers (2023-03-19T07:53:31Z)
Protecting Semantic Segmentation Models by Using Block-wise Image Encryption with Secret Key from Unauthorized Access [13.106063755117399]
We propose to protect semantic segmentation models from unauthorized access by utilizing block-wise transformation with a secret key. Experiment results show that the proposed protection method allows rightful users with the correct key to access the model to full capacity and deteriorate the performance for unauthorized users.
arXiv Detail & Related papers (2021-07-20T09:31:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.