Towards Confidential and Efficient LLM Inference with Dual Privacy Protection
- URL: http://arxiv.org/abs/2509.09091v1
- Date: Thu, 11 Sep 2025 01:54:13 GMT
- Title: Towards Confidential and Efficient LLM Inference with Dual Privacy Protection
- Authors: Honglan Yu, Yibin Wang, Feifei Dai, Dong Liu, Haihui Fan, Xiaoyan Gu,
- Abstract summary: This paper proposes CMIF, a Confidential and efficient Model Inference Framework.<n> CMIF confidentially deploys the embedding layer in the client-side TEE and subsequent layers on GPU servers.<n>It optimize the Report-Noisy-Max mechanism to protect sensitive inputs with a slight decrease in model performance.
- Score: 11.22744810136105
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: CPU-based trusted execution environments (TEEs) and differential privacy (DP) have gained wide applications for private inference. Due to high inference latency in TEEs, researchers use partition-based approaches that offload linear model components to GPUs. However, dense nonlinear layers of large language models (LLMs) result in significant communication overhead between TEEs and GPUs. DP-based approaches apply random noise to protect data privacy, but this compromises LLM performance and semantic understanding. To overcome the above drawbacks, this paper proposes CMIF, a Confidential and efficient Model Inference Framework. CMIF confidentially deploys the embedding layer in the client-side TEE and subsequent layers on GPU servers. Meanwhile, it optimizes the Report-Noisy-Max mechanism to protect sensitive inputs with a slight decrease in model performance. Extensive experiments on Llama-series models demonstrate that CMIF reduces additional inference overhead in TEEs while preserving user data privacy.
Related papers
- NeuroFilter: Privacy Guardrails for Conversational LLM Agents [50.75206727081996]
This work addresses the computational challenge of enforcing privacy for agentic Large Language Models (LLMs)<n>NeuroFilter is a guardrail framework that operationalizes contextual integrity by mapping norm violations to simple directions in the model's activation space.<n>A comprehensive evaluation across over 150,000 interactions, covering models from 7B to 70B parameters, illustrates the strong performance of NeuroFilter.
arXiv Detail & Related papers (2026-01-21T05:16:50Z) - PRISM: Privacy-Aware Routing for Adaptive Cloud-Edge LLM Inference via Semantic Sketch Collaboration [8.776463501718737]
We propose a context-aware framework that dynamically balances privacy and inference quality.<n>PRISM executes in four stages: (1) the edge device profiles entity-level sensitivity; (2) a soft gating module on the edge selects an execution mode - cloud, edge, or collaboration; (3) for collaborative paths, the edge applies adaptive two-layer local differential privacy based on entity risks; and (4) the cloud LLM generates a semantic sketch from the perturbed prompt.
arXiv Detail & Related papers (2025-11-27T22:32:33Z) - DP-FedLoRA: Privacy-Enhanced Federated Fine-Tuning for On-Device Large Language Models [17.265217612125905]
DP-FedLoRA is a privacy-enhanced federated fine-tuning framework.<n>It integrates LoRA-based adaptation with differential privacy in a communication-efficient setting.<n>We show that DP-FedLoRA delivers competitive performance while offering strong privacy guarantees.
arXiv Detail & Related papers (2025-09-11T02:16:34Z) - FedShield-LLM: A Secure and Scalable Federated Fine-Tuned Large Language Model [0.48342038441006796]
Federated Learning (FL) offers a decentralized framework for training and fine-tuning Large Language Models (LLMs)<n>FL addresses privacy and security concerns while navigating challenges associated with the substantial computational demands of LLMs.<n>We propose a novel method, FedShield-LLM, that uses pruning with Fully Homomorphic Encryption (FHE) for Low-Rank Adaptation (LoRA) parameters.
arXiv Detail & Related papers (2025-06-06T00:05:05Z) - The Larger the Merrier? Efficient Large AI Model Inference in Wireless Edge Networks [56.37880529653111]
The demand for large computation model (LAIM) services is driving a paradigm shift from traditional cloud-based inference to edge-based inference for low-latency, privacy-preserving applications.<n>In this paper, we investigate the LAIM-inference scheme, where a pre-trained LAIM is pruned and partitioned into on-device and on-server sub-models for deployment.
arXiv Detail & Related papers (2025-05-14T08:18:55Z) - A performance analysis of VM-based Trusted Execution Environments for Confidential Federated Learning [0.0]
Federated Learning (FL) is a distributed machine learning approach that has emerged as an effective way to address recent privacy concerns.<n>FL introduces the need for additional security measures as FL alone is still subject to vulnerabilities such as model and data poisoning and inference attacks.<n> Confidential Computing (CC) is a paradigm that, by leveraging hardware-based trusted execution environments (TEEs), protects the confidentiality and integrity of ML models and data.
arXiv Detail & Related papers (2025-01-20T15:58:48Z) - DMM: Distributed Matrix Mechanism for Differentially-Private Federated Learning Based on Constant-Overhead Linear Secret Resharing [51.336015600778396]
We introduce the distributed matrix mechanism to achieve the best-of-both-worlds; better privacy of distributed DP and better utility from the matrix mechanism.<n>We accomplish this using a novel cryptographic protocol that securely transfers sensitive values across client committees of different training iterations with constant communication overhead.
arXiv Detail & Related papers (2024-10-21T16:25:14Z) - Split-and-Denoise: Protect large language model inference with local differential privacy [2.572566198588905]
Split-N-Denoise (SnD) is a private inference framework that splits the model to execute the token embedding layer on the client side at minimal computational cost.
We show SnD's effectiveness in optimizing the privacy-utility tradeoff across various LLM architectures and diverse downstream tasks.
arXiv Detail & Related papers (2023-10-13T14:17:33Z) - Over-the-Air Federated Learning with Privacy Protection via Correlated
Additive Perturbations [57.20885629270732]
We consider privacy aspects of wireless federated learning with Over-the-Air (OtA) transmission of gradient updates from multiple users/agents to an edge server.
Traditional perturbation-based methods provide privacy protection while sacrificing the training accuracy.
In this work, we aim at minimizing privacy leakage to the adversary and the degradation of model accuracy at the edge server.
arXiv Detail & Related papers (2022-10-05T13:13:35Z) - Distributed Reinforcement Learning for Privacy-Preserving Dynamic Edge
Caching [91.50631418179331]
A privacy-preserving distributed deep policy gradient (P2D3PG) is proposed to maximize the cache hit rates of devices in the MEC networks.
We convert the distributed optimizations into model-free Markov decision process problems and then introduce a privacy-preserving federated learning method for popularity prediction.
arXiv Detail & Related papers (2021-10-20T02:48:27Z) - Differentially private federated deep learning for multi-site medical
image segmentation [56.30543374146002]
Collaborative machine learning techniques such as federated learning (FL) enable the training of models on effectively larger datasets without data transfer.
Recent initiatives have demonstrated that segmentation models trained with FL can achieve performance similar to locally trained models.
However, FL is not a fully privacy-preserving technique and privacy-centred attacks can disclose confidential patient data.
arXiv Detail & Related papers (2021-07-06T12:57:32Z) - Differentially Private Federated Learning with Laplacian Smoothing [72.85272874099644]
Federated learning aims to protect data privacy by collaboratively learning a model without sharing private data among users.
An adversary may still be able to infer the private training data by attacking the released model.
Differential privacy provides a statistical protection against such attacks at the price of significantly degrading the accuracy or utility of the trained models.
arXiv Detail & Related papers (2020-05-01T04:28:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.