Related papers: SLIP: Securing LLMs IP Using Weights Decomposition

Related papers

Amulet: Fast TEE-Shielded Inference for On-Device Model Protection [15.936694312917512]
On-device machine learning (ML) introduces new security concerns about model privacy.<n> Storing valuable trained ML models on user devices exposes them to potential extraction by adversaries.<n>We propose Amulet, a fast TEE-shielded on-device inference framework for ML model protection.
arXiv Detail & Related papers (2025-12-08T12:22:51Z)
SecureInfer: Heterogeneous TEE-GPU Architecture for Privacy-Critical Tensors for Large Language Model Deployment [9.666696979829359]
SecureInfer is a framework that offloads compute-intensive operations to untrusted accelerators.<n>We implement a prototype of SecureInfer using the LLaMA-2 model and evaluate it across performance and security metrics.
arXiv Detail & Related papers (2025-10-22T19:17:31Z)
Speculative Safety-Aware Decoding [46.78651034593231]
We introduce Speculative Safety-Aware Decoding (SSD), a lightweight decoding-time approach that equips LLMs with the desired safety property while accelerating inference.<n>SSD integrates speculative sampling during decoding and leverages the match ratio between the small and composite models to quantify jailbreak risks.<n> Experimental results show that SSD successfully equips the large model with the desired safety property, and also allows the model to remain helpful to benign queries.
arXiv Detail & Related papers (2025-08-25T07:30:10Z)
LoX: Low-Rank Extrapolation Robustifies LLM Safety Against Fine-tuning [61.594212398272184]
Low-Rank Extrapolation (LoX) improves robustness against benign and malicious fine-tuning attacks.<n>LoX leads to 11% to 54% absolute reductions in attack success rates.
arXiv Detail & Related papers (2025-06-18T16:30:02Z)
FedShield-LLM: A Secure and Scalable Federated Fine-Tuned Large Language Model [0.48342038441006796]
Federated Learning (FL) offers a decentralized framework for training and fine-tuning Large Language Models (LLMs)<n>FL addresses privacy and security concerns while navigating challenges associated with the substantial computational demands of LLMs.<n>We propose a novel method, FedShield-LLM, that uses pruning with Fully Homomorphic Encryption (FHE) for Low-Rank Adaptation (LoRA) parameters.
arXiv Detail & Related papers (2025-06-06T00:05:05Z)
MISLEADER: Defending against Model Extraction with Ensembles of Distilled Models [56.09354775405601]
Model extraction attacks aim to replicate the functionality of a black-box model through query access.<n>Most existing defenses presume that attacker queries have out-of-distribution (OOD) samples, enabling them to detect and disrupt suspicious inputs.<n>We propose MISLEADER, a novel defense strategy that does not rely on OOD assumptions.
arXiv Detail & Related papers (2025-06-03T01:37:09Z)
How Robust Are Router-LLMs? Analysis of the Fragility of LLM Routing Capabilities [62.474732677086855]
Large language model (LLM) routing has emerged as a crucial strategy for balancing computational costs with performance.<n>We propose the DSC benchmark: Diverse, Simple, and Categorized, an evaluation framework that categorizes router performance across a broad spectrum of query types.
arXiv Detail & Related papers (2025-03-20T19:52:30Z)
SafetyDPO: Scalable Safety Alignment for Text-to-Image Generation [68.07258248467309]
Text-to-image (T2I) models have become widespread, but their limited safety guardrails expose end users to harmful content and potentially allow for model misuse. Current safety measures are typically limited to text-based filtering or concept removal strategies, able to remove just a few concepts from the model's generative capabilities. We introduce SafetyDPO, a method for safety alignment of T2I models through Direct Preference Optimization (DPO) We train safety experts, in the form of low-rank adaptation (LoRA) matrices, able to guide the generation process away from specific safety-related
arXiv Detail & Related papers (2024-12-13T18:59:52Z)
The Communication-Friendly Privacy-Preserving Machine Learning against Malicious Adversaries [14.232901861974819]
Privacy-preserving machine learning (PPML) is an innovative approach that allows for secure data analysis while safeguarding sensitive information. We introduce efficient protocol for secure linear function evaluation. We extend the protocol to handle linear and non-linear layers, ensuring compatibility with a wide range of machine-learning models.
arXiv Detail & Related papers (2024-11-14T08:55:14Z)
Iterative Self-Tuning LLMs for Enhanced Jailbreaking Capabilities [63.603861880022954]
We introduce ADV-LLM, an iterative self-tuning process that crafts adversarial LLMs with enhanced jailbreak ability. Our framework significantly reduces the computational cost of generating adversarial suffixes while achieving nearly 100% ASR on various open-source LLMs. It exhibits strong attack transferability to closed-source models, achieving 99% ASR on GPT-3.5 and 49% ASR on GPT-4, despite being optimized solely on Llama3.
arXiv Detail & Related papers (2024-10-24T06:36:12Z)
CoreGuard: Safeguarding Foundational Capabilities of LLMs Against Model Stealing in Edge Deployment [66.72332011814183]
CoreGuard is a computation- and communication-efficient protection method for proprietary large language models (LLMs) deployed on edge devices.<n> CoreGuard employs an efficient protection protocol to reduce computational overhead and minimize communication overhead via a propagation protocol.
arXiv Detail & Related papers (2024-10-16T08:14:24Z)
Position: On-Premises LLM Deployment Demands a Middle Path: Preserving Privacy Without Sacrificing Model Confidentiality [18.575663556525864]
We argue that deploying closed-source LLMs within user-controlled infrastructure enhances data privacy and mitigates misuse risks. A well-designed on-premises deployment must ensure model confidentiality -- by preventing model theft -- and offer privacy-preserving customization. Our findings demonstrate that privacy and confidentiality can coexist, paving the way for secure on-premises AI deployment.
arXiv Detail & Related papers (2024-10-15T02:00:36Z)
PriRoAgg: Achieving Robust Model Aggregation with Minimum Privacy Leakage for Federated Learning [49.916365792036636]
Federated learning (FL) has recently gained significant momentum due to its potential to leverage large-scale distributed user data. The transmitted model updates can potentially leak sensitive user information, and the lack of central control of the local training process leaves the global model susceptible to malicious manipulations on model updates. We develop a general framework PriRoAgg, utilizing Lagrange coded computing and distributed zero-knowledge proof, to execute a wide range of robust aggregation algorithms while satisfying aggregated privacy.
arXiv Detail & Related papers (2024-07-12T03:18:08Z)
Jailbreaking as a Reward Misspecification Problem [80.52431374743998]
We propose a novel perspective that attributes this vulnerability to reward misspecification during the alignment process. We introduce a metric ReGap to quantify the extent of reward misspecification and demonstrate its effectiveness. We present ReMiss, a system for automated red teaming that generates adversarial prompts in a reward-misspecified space.
arXiv Detail & Related papers (2024-06-20T15:12:27Z)
Defensive Prompt Patch: A Robust and Interpretable Defense of LLMs against Jailbreak Attacks [59.46556573924901]
This paper introduces Defensive Prompt Patch (DPP), a novel prompt-based defense mechanism for large language models (LLMs) Unlike previous approaches, DPP is designed to achieve a minimal Attack Success Rate (ASR) while preserving the high utility of LLMs. Empirical results conducted on LLAMA-2-7B-Chat and Mistral-7B-Instruct-v0.2 models demonstrate the robustness and adaptability of DPP.
arXiv Detail & Related papers (2024-05-30T14:40:35Z)
RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content [62.685566387625975]
Current mitigation strategies, while effective, are not resilient under adversarial attacks. This paper introduces Resilient Guardrails for Large Language Models (RigorLLM), a novel framework designed to efficiently moderate harmful and unsafe inputs.
arXiv Detail & Related papers (2024-03-19T07:25:02Z)
MAP: MAsk-Pruning for Source-Free Model Intellectual Property Protection [18.99205251538783]
MAsk Pruning (MAP) is a framework for locating and pruning target-related parameters in a well-trained model. MAP freezes the source model and learns a target-specific binary mask to prevent unauthorized data usage. Extensive experiments indicate that MAP yields new state-of-the-art performance.
arXiv Detail & Related papers (2024-03-07T02:10:59Z)
EncryIP: A Practical Encryption-Based Framework for Model Intellectual Property Protection [17.655627250882805]
This paper introduces a practical encryption-based framework called textitEncryIP. It seamlessly integrates a public-key encryption scheme into the model learning process. It demonstrates superior effectiveness in both training protected models and efficiently detecting the unauthorized spread of ML models.
arXiv Detail & Related papers (2023-12-19T11:11:03Z)
MirrorNet: A TEE-Friendly Framework for Secure On-device DNN Inference [14.08010398777227]
Deep neural network (DNN) models have become prevalent in edge devices for real-time inference. Existing defense approaches fail to fully safeguard model confidentiality or result in significant latency issues. This paper presents MirrorNet, which generates a TEE-friendly implementation for any given DNN model to protect the model confidentiality. For the evaluation, MirrorNet can achieve a 18.6% accuracy gap between authenticated and illegal use, while only introducing 0.99% hardware overhead.
arXiv Detail & Related papers (2023-11-16T01:21:19Z)
Private, Efficient, and Accurate: Protecting Models Trained by Multi-party Learning with Differential Privacy [8.8480262507008]
We propose PEA (Private, Efficient, Accurate), which consists of a secure DPSGD protocol and two optimization methods. We implement PEA in two open-source MPL frameworks: TF-Encrypted and Queqiao. Experiments show that PEA can train a differentially private classification model with an accuracy of 88% for CIFAR-10 within 7 minutes under the LAN setting.
arXiv Detail & Related papers (2022-08-18T06:48:25Z)
Is Vertical Logistic Regression Privacy-Preserving? A Comprehensive Privacy Analysis and Beyond [57.10914865054868]
We consider vertical logistic regression (VLR) trained with mini-batch descent gradient. We provide a comprehensive and rigorous privacy analysis of VLR in a class of open-source Federated Learning frameworks.
arXiv Detail & Related papers (2022-07-19T05:47:30Z)
RelaxLoss: Defending Membership Inference Attacks without Losing Utility [68.48117818874155]
We propose a novel training framework based on a relaxed loss with a more achievable learning target. RelaxLoss is applicable to any classification model with added benefits of easy implementation and negligible overhead. Our approach consistently outperforms state-of-the-art defense mechanisms in terms of resilience against MIAs.
arXiv Detail & Related papers (2022-07-12T19:34:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.