DistilLock: Safeguarding LLMs from Unauthorized Knowledge Distillation on the Edge
- URL: http://arxiv.org/abs/2510.16716v1
- Date: Sun, 19 Oct 2025 05:00:21 GMT
- Title: DistilLock: Safeguarding LLMs from Unauthorized Knowledge Distillation on the Edge
- Authors: Asmita Mohanty, Gezheng Kang, Lei Gao, Murali Annavaram,
- Abstract summary: DistilLock is a TEE-assisted fine-tuning framework that enables privacy-preserving knowledge distillation on the edge.<n>We demonstrate that DistilLock prevents unauthorized knowledge distillation processes and model-stealing attacks.
- Score: 13.266175396099248
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Large Language Models (LLMs) have demonstrated strong performance across diverse tasks, but fine-tuning them typically relies on cloud-based, centralized infrastructures. This requires data owners to upload potentially sensitive data to external servers, raising serious privacy concerns. An alternative approach is to fine-tune LLMs directly on edge devices using local data; however, this introduces a new challenge: the model owner must transfer proprietary models to the edge, which risks intellectual property (IP) leakage. To address this dilemma, we propose DistilLock, a TEE-assisted fine-tuning framework that enables privacy-preserving knowledge distillation on the edge. In DistilLock, a proprietary foundation model is executed within a trusted execution environment (TEE) enclave on the data owner's device, acting as a secure black-box teacher. This setup preserves both data privacy and model IP by preventing direct access to model internals. Furthermore, DistilLock employs a model obfuscation mechanism to offload obfuscated weights to untrusted accelerators for efficient knowledge distillation without compromising security. We demonstrate that DistilLock prevents unauthorized knowledge distillation processes and model-stealing attacks while maintaining high computational efficiency, but offering a secure and practical solution for edge-based LLM personalization.
Related papers
- Your Inference Request Will Become a Black Box: Confidential Inference for Cloud-based Large Language Models [39.390624817461905]
Talaria is a confidential inference framework that partitions the Large Language Models pipeline to protect client data.<n>Talaria executes sensitive, weight-independent operations within a client-controlled Confidential Virtual Machine.<n>Talaria can defend against state-of-the-art token inference attacks, reducing token reconstruction accuracy from over 97.5% to an average of 1.34%.
arXiv Detail & Related papers (2026-02-27T06:37:07Z) - SRFed: Mitigating Poisoning Attacks in Privacy-Preserving Federated Learning with Heterogeneous Data [5.7335377562335275]
Federated Learning (FL) enables collaborative model training without exposing clients' private data, and has been widely adopted in privacy-sensitive scenarios.<n>It faces two critical security threats: curious servers that may launch inference attacks to reconstruct clients' private data, and compromised clients that can launch poisoning attacks to disrupt model aggregation.<n>We propose SRFed, an efficient Byzantine-robust and privacy-preserving FL framework for Non-IID scenarios.
arXiv Detail & Related papers (2026-02-18T14:14:38Z) - SDGO: Self-Discrimination-Guided Optimization for Consistent Safety in Large Language Models [48.62892618209157]
Large Language Models (LLMs) excel at various natural language processing tasks but remain vulnerable to jailbreaking attacks.<n>This paper explores aligning the model's inherent discrimination and generation capabilities.<n>Our method does not require any additional annotated data or external models during the training phase.
arXiv Detail & Related papers (2025-08-21T15:26:09Z) - PersGuard: Preventing Malicious Personalization via Backdoor Attacks on Pre-trained Text-to-Image Diffusion Models [51.458089902581456]
We introduce PersGuard, a novel backdoor-based approach that prevents malicious personalization of specific images.<n>Our method significantly outperforms existing techniques, offering a more robust solution for privacy and copyright protection.
arXiv Detail & Related papers (2025-02-22T09:47:55Z) - A Novel Access Control and Privacy-Enhancing Approach for Models in Edge Computing [0.26107298043931193]
We propose a novel model access control method tailored for edge computing environments.
This method leverages image style as a licensing mechanism, embedding style recognition into the model's operational framework.
By restricting the input data to the edge model, this approach not only prevents attackers from gaining unauthorized access to the model but also enhances the privacy of data on terminal devices.
arXiv Detail & Related papers (2024-11-06T11:37:30Z) - A Middle Path for On-Premises LLM Deployment: Preserving Privacy Without Sacrificing Model Confidentiality [20.646221081945523]
Privacy-sensitive users require deploying large language models (LLMs) within their own infrastructure (on-premises) to safeguard private data and enable customization.<n>Previous research on small models has explored securing only the output layer within hardware-secured devices to balance model confidentiality and customization.<n>We propose SOLID, a novel deployment framework that secures a few bottom layers in a secure environment and introduces an efficient metric to optimize the trade-off.
arXiv Detail & Related papers (2024-10-15T02:00:36Z) - SLIP: Securing LLMs IP Using Weights Decomposition [3.097016309594195]
High cost of cloud-based deployment has spurred interest in running models on edge devices.<n>We introduce SLIP, a hybrid inference algorithm designed to protect edge-deployed models from theft.
arXiv Detail & Related papers (2024-07-15T16:37:55Z) - ModelShield: Adaptive and Robust Watermark against Model Extraction Attack [58.46326901858431]
Large language models (LLMs) demonstrate general intelligence across a variety of machine learning tasks.<n> adversaries can still utilize model extraction attacks to steal the model intelligence encoded in model generation.<n> Watermarking technology offers a promising solution for defending against such attacks by embedding unique identifiers into the model-generated content.
arXiv Detail & Related papers (2024-05-03T06:41:48Z) - FheFL: Fully Homomorphic Encryption Friendly Privacy-Preserving Federated Learning with Byzantine Users [19.209830150036254]
federated learning (FL) technique was developed to mitigate data privacy issues in the traditional machine learning paradigm.
Next-generation FL architectures proposed encryption and anonymization techniques to protect the model updates from the server.
This paper proposes a novel FL algorithm based on a fully homomorphic encryption (FHE) scheme.
arXiv Detail & Related papers (2023-06-08T11:20:00Z) - CrowdGuard: Federated Backdoor Detection in Federated Learning [39.58317527488534]
This paper presents a novel defense mechanism, CrowdGuard, that effectively mitigates backdoor attacks in Federated Learning.
CrowdGuard employs a server-located stacked clustering scheme to enhance its resilience to rogue client feedback.
The evaluation results demonstrate that CrowdGuard achieves a 100% True-Positive-Rate and True-Negative-Rate across various scenarios.
arXiv Detail & Related papers (2022-10-14T11:27:49Z) - Over-the-Air Federated Learning with Privacy Protection via Correlated
Additive Perturbations [57.20885629270732]
We consider privacy aspects of wireless federated learning with Over-the-Air (OtA) transmission of gradient updates from multiple users/agents to an edge server.
Traditional perturbation-based methods provide privacy protection while sacrificing the training accuracy.
In this work, we aim at minimizing privacy leakage to the adversary and the degradation of model accuracy at the edge server.
arXiv Detail & Related papers (2022-10-05T13:13:35Z) - MOVE: Effective and Harmless Ownership Verification via Embedded External Features [104.97541464349581]
We propose an effective and harmless model ownership verification (MOVE) to defend against different types of model stealing simultaneously.<n>We conduct the ownership verification by verifying whether a suspicious model contains the knowledge of defender-specified external features.<n>We then train a meta-classifier to determine whether a model is stolen from the victim.
arXiv Detail & Related papers (2022-08-04T02:22:29Z) - Safe Distillation Box [62.32105311993915]
We propose a novel framework, termed as Safe Distillation Box (SDB), that allows us to wrap a pre-trained model in a virtual box for intellectual property protection.
SDB preserves the inference capability of the wrapped model to all users, but precludes KD from unauthorized users.
For authorized users, on the other hand, SDB carries out a knowledge augmentation scheme to strengthen the KD performances and the results of the student model.
arXiv Detail & Related papers (2021-12-05T05:01:55Z) - RoFL: Attestable Robustness for Secure Federated Learning [59.63865074749391]
Federated Learning allows a large number of clients to train a joint model without the need to share their private data.
To ensure the confidentiality of the client updates, Federated Learning systems employ secure aggregation.
We present RoFL, a secure Federated Learning system that improves robustness against malicious clients.
arXiv Detail & Related papers (2021-07-07T15:42:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.