Related papers: SLIP: Securing LLMs IP Using Weights Decomposition

SLIP: Securing LLMs IP Using Weights Decomposition

URL: http://arxiv.org/abs/2407.10886v2
Date: Thu, 1 Aug 2024 20:34:18 GMT
Title: SLIP: Securing LLMs IP Using Weights Decomposition
Authors: Yehonathan Refael, Adam Hakim, Lev Greenberg, Tal Aviv, Satya Lokam, Ben Fishman, Shachar Seidman,
Abstract summary: Large language models (LLMs) have recently seen widespread adoption, in both academia and industry. As these models grow, they become valuable intellectual property (IP), reflecting enormous investments by their owners. Current methods to protect models' IP on the edge have limitations in terms of practicality, loss in accuracy, or suitability to requirements. We introduce a novel hybrid inference algorithm, named SLIP, designed to protect edge-deployed models from theft.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) have recently seen widespread adoption, in both academia and industry. As these models grow, they become valuable intellectual property (IP), reflecting enormous investments by their owners. Moreover, the high cost of cloud-based deployment has driven interest towards deployment to edge devices, yet this risks exposing valuable parameters to theft and unauthorized use. Current methods to protect models' IP on the edge have limitations in terms of practicality, loss in accuracy, or suitability to requirements. In this paper, we introduce a novel hybrid inference algorithm, named SLIP, designed to protect edge-deployed models from theft. SLIP is the first hybrid protocol that is both practical for real-world applications and provably secure, while having zero accuracy degradation and minimal impact on latency. It involves partitioning the model between two computing resources, one secure but expensive, and another cost-effective but vulnerable. This is achieved through matrix decomposition, ensuring that the secure resource retains a maximally sensitive portion of the model's IP while performing a minimal amount of computations, and vice versa for the vulnerable resource. Importantly, the protocol includes security guarantees that prevent attackers from exploiting the partition to infer the secured information. Finally, we present experimental results that show the robustness and effectiveness of our method, positioning it as a compelling solution for protecting LLMs.

Related papers

LoX: Low-Rank Extrapolation Robustifies LLM Safety Against Fine-tuning [61.594212398272184]
Low-Rank Extrapolation (LoX) improves robustness against benign and malicious fine-tuning attacks.<n>LoX leads to 11% to 54% absolute reductions in attack success rates.
arXiv Detail & Related papers (2025-06-18T16:30:02Z)
FedShield-LLM: A Secure and Scalable Federated Fine-Tuned Large Language Model [0.48342038441006796]
Federated Learning (FL) offers a decentralized framework for training and fine-tuning Large Language Models (LLMs)<n>FL addresses privacy and security concerns while navigating challenges associated with the substantial computational demands of LLMs.<n>We propose a novel method, FedShield-LLM, that uses pruning with Fully Homomorphic Encryption (FHE) for Low-Rank Adaptation (LoRA) parameters.
arXiv Detail & Related papers (2025-06-06T00:05:05Z)
MISLEADER: Defending against Model Extraction with Ensembles of Distilled Models [56.09354775405601]
Model extraction attacks aim to replicate the functionality of a black-box model through query access.<n>Most existing defenses presume that attacker queries have out-of-distribution (OOD) samples, enabling them to detect and disrupt suspicious inputs.<n>We propose MISLEADER, a novel defense strategy that does not rely on OOD assumptions.
arXiv Detail & Related papers (2025-06-03T01:37:09Z)
SafetyDPO: Scalable Safety Alignment for Text-to-Image Generation [68.07258248467309]
Text-to-image (T2I) models have become widespread, but their limited safety guardrails expose end users to harmful content and potentially allow for model misuse. Current safety measures are typically limited to text-based filtering or concept removal strategies, able to remove just a few concepts from the model's generative capabilities. We introduce SafetyDPO, a method for safety alignment of T2I models through Direct Preference Optimization (DPO) We train safety experts, in the form of low-rank adaptation (LoRA) matrices, able to guide the generation process away from specific safety-related
arXiv Detail & Related papers (2024-12-13T18:59:52Z)
The Communication-Friendly Privacy-Preserving Machine Learning against Malicious Adversaries [14.232901861974819]
Privacy-preserving machine learning (PPML) is an innovative approach that allows for secure data analysis while safeguarding sensitive information. We introduce efficient protocol for secure linear function evaluation. We extend the protocol to handle linear and non-linear layers, ensuring compatibility with a wide range of machine-learning models.
arXiv Detail & Related papers (2024-11-14T08:55:14Z)
Iterative Self-Tuning LLMs for Enhanced Jailbreaking Capabilities [63.603861880022954]
We introduce ADV-LLM, an iterative self-tuning process that crafts adversarial LLMs with enhanced jailbreak ability. Our framework significantly reduces the computational cost of generating adversarial suffixes while achieving nearly 100% ASR on various open-source LLMs. It exhibits strong attack transferability to closed-source models, achieving 99% ASR on GPT-3.5 and 49% ASR on GPT-4, despite being optimized solely on Llama3.
arXiv Detail & Related papers (2024-10-24T06:36:12Z)
Position: On-Premises LLM Deployment Demands a Middle Path: Preserving Privacy Without Sacrificing Model Confidentiality [18.575663556525864]
We argue that deploying closed-source LLMs within user-controlled infrastructure enhances data privacy and mitigates misuse risks. A well-designed on-premises deployment must ensure model confidentiality -- by preventing model theft -- and offer privacy-preserving customization. Our findings demonstrate that privacy and confidentiality can coexist, paving the way for secure on-premises AI deployment.
arXiv Detail & Related papers (2024-10-15T02:00:36Z)
PriRoAgg: Achieving Robust Model Aggregation with Minimum Privacy Leakage for Federated Learning [49.916365792036636]
Federated learning (FL) has recently gained significant momentum due to its potential to leverage large-scale distributed user data. The transmitted model updates can potentially leak sensitive user information, and the lack of central control of the local training process leaves the global model susceptible to malicious manipulations on model updates. We develop a general framework PriRoAgg, utilizing Lagrange coded computing and distributed zero-knowledge proof, to execute a wide range of robust aggregation algorithms while satisfying aggregated privacy.
arXiv Detail & Related papers (2024-07-12T03:18:08Z)
Jailbreaking as a Reward Misspecification Problem [80.52431374743998]
We propose a novel perspective that attributes this vulnerability to reward misspecification during the alignment process. We introduce a metric ReGap to quantify the extent of reward misspecification and demonstrate its effectiveness. We present ReMiss, a system for automated red teaming that generates adversarial prompts in a reward-misspecified space.
arXiv Detail & Related papers (2024-06-20T15:12:27Z)
Defensive Prompt Patch: A Robust and Interpretable Defense of LLMs against Jailbreak Attacks [59.46556573924901]
This paper introduces Defensive Prompt Patch (DPP), a novel prompt-based defense mechanism for large language models (LLMs) Unlike previous approaches, DPP is designed to achieve a minimal Attack Success Rate (ASR) while preserving the high utility of LLMs. Empirical results conducted on LLAMA-2-7B-Chat and Mistral-7B-Instruct-v0.2 models demonstrate the robustness and adaptability of DPP.
arXiv Detail & Related papers (2024-05-30T14:40:35Z)
RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content [62.685566387625975]
Current mitigation strategies, while effective, are not resilient under adversarial attacks. This paper introduces Resilient Guardrails for Large Language Models (RigorLLM), a novel framework designed to efficiently moderate harmful and unsafe inputs.
arXiv Detail & Related papers (2024-03-19T07:25:02Z)
MAP: MAsk-Pruning for Source-Free Model Intellectual Property Protection [18.99205251538783]
MAsk Pruning (MAP) is a framework for locating and pruning target-related parameters in a well-trained model. MAP freezes the source model and learns a target-specific binary mask to prevent unauthorized data usage. Extensive experiments indicate that MAP yields new state-of-the-art performance.
arXiv Detail & Related papers (2024-03-07T02:10:59Z)
EncryIP: A Practical Encryption-Based Framework for Model Intellectual Property Protection [17.655627250882805]
This paper introduces a practical encryption-based framework called textitEncryIP. It seamlessly integrates a public-key encryption scheme into the model learning process. It demonstrates superior effectiveness in both training protected models and efficiently detecting the unauthorized spread of ML models.
arXiv Detail & Related papers (2023-12-19T11:11:03Z)
MirrorNet: A TEE-Friendly Framework for Secure On-device DNN Inference [14.08010398777227]
Deep neural network (DNN) models have become prevalent in edge devices for real-time inference. Existing defense approaches fail to fully safeguard model confidentiality or result in significant latency issues. This paper presents MirrorNet, which generates a TEE-friendly implementation for any given DNN model to protect the model confidentiality. For the evaluation, MirrorNet can achieve a 18.6% accuracy gap between authenticated and illegal use, while only introducing 0.99% hardware overhead.
arXiv Detail & Related papers (2023-11-16T01:21:19Z)
Private, Efficient, and Accurate: Protecting Models Trained by Multi-party Learning with Differential Privacy [8.8480262507008]
We propose PEA (Private, Efficient, Accurate), which consists of a secure DPSGD protocol and two optimization methods. We implement PEA in two open-source MPL frameworks: TF-Encrypted and Queqiao. Experiments show that PEA can train a differentially private classification model with an accuracy of 88% for CIFAR-10 within 7 minutes under the LAN setting.
arXiv Detail & Related papers (2022-08-18T06:48:25Z)
Is Vertical Logistic Regression Privacy-Preserving? A Comprehensive Privacy Analysis and Beyond [57.10914865054868]
We consider vertical logistic regression (VLR) trained with mini-batch descent gradient. We provide a comprehensive and rigorous privacy analysis of VLR in a class of open-source Federated Learning frameworks.
arXiv Detail & Related papers (2022-07-19T05:47:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.