Related papers: Your Inference Request Will Become a Black Box: Confidential Inference for Cloud-based Large Language Models

Your Inference Request Will Become a Black Box: Confidential Inference for Cloud-based Large Language Models

URL: http://arxiv.org/abs/2603.00196v1
Date: Fri, 27 Feb 2026 06:37:07 GMT
Title: Your Inference Request Will Become a Black Box: Confidential Inference for Cloud-based Large Language Models
Authors: Chung-ju Huang, Huiqiang Zhao, Yuanpeng He, Lijian Li, Wenpin Jiao, Zhi Jin, Peixuan Chen, Leye Wang,
Abstract summary: Talaria is a confidential inference framework that partitions the Large Language Models pipeline to protect client data.<n>Talaria executes sensitive, weight-independent operations within a client-controlled Confidential Virtual Machine.<n>Talaria can defend against state-of-the-art token inference attacks, reducing token reconstruction accuracy from over 97.5% to an average of 1.34%.
Score: 39.390624817461905
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The increasing reliance on cloud-hosted Large Language Models (LLMs) exposes sensitive client data, such as prompts and responses, to potential privacy breaches by service providers. Existing approaches fail to ensure privacy, maintain model performance, and preserve computational efficiency simultaneously. To address this challenge, we propose Talaria, a confidential inference framework that partitions the LLM pipeline to protect client data without compromising the cloud's model intellectual property or inference quality. Talaria executes sensitive, weight-independent operations within a client-controlled Confidential Virtual Machine (CVM) while offloading weight-dependent computations to the cloud GPUs. The interaction between these environments is secured by our Reversible Masked Outsourcing (ReMO) protocol, which uses a hybrid masking technique to reversibly obscure intermediate data before outsourcing computations. Extensive evaluations show that Talaria can defend against state-of-the-art token inference attacks, reducing token reconstruction accuracy from over 97.5% to an average of 1.34%, all while being a lossless mechanism that guarantees output identical to the original model without significantly decreasing efficiency and scalability. To the best of our knowledge, this is the first work that ensures clients' prompts and responses remain inaccessible to the cloud, while also preserving model privacy, performance, and efficiency.

Related papers

ZORRO: Zero-Knowledge Robustness and Privacy for Split Learning (Full Version) [58.595691399741646]
Split Learning (SL) is a distributed learning approach that enables resource-constrained clients to collaboratively train deep neural networks (DNNs)<n>This setup enables SL to leverage server capacities without sharing data, making it highly effective in resource-constrained environments dealing with sensitive data.<n>We present ZORRO, a private, verifiable, and robust SL defense scheme.
arXiv Detail & Related papers (2025-09-11T18:44:09Z)
Efficient and Verifiable Privacy-Preserving Convolutional Computation for CNN Inference with Untrusted Clouds [1.1545092788508224]
We propose a verifiable privacy-preserving scheme tailored for CNN convolutional layers.<n>Our scheme enables efficient encryption and decryption, allowing resource-constrained clients to securely offload computations to the untrusted cloud server.
arXiv Detail & Related papers (2025-08-18T11:17:53Z)
PWC-MoE: Privacy-Aware Wireless Collaborative Mixture of Experts [59.5243730853157]
Large language models (LLMs) hosted on cloud servers alleviate the computational and storage burdens on local devices but raise privacy concerns.<n>Small language models (SLMs) running locally enhance privacy but suffer from limited performance on complex tasks.<n>We propose a privacy-aware wireless collaborative mixture of experts (PWC-MoE) framework to balance computational cost, performance, and privacy protection under bandwidth constraints.
arXiv Detail & Related papers (2025-05-13T16:27:07Z)
FedEM: A Privacy-Preserving Framework for Concurrent Utility Preservation in Federated Learning [17.853502904387376]
Federated Learning (FL) enables collaborative training of models across distributed clients without sharing local data, addressing privacy concerns in decentralized systems.<n>We propose Federated Error Minimization (FedEM), a novel algorithm that incorporates controlled perturbations through adaptive noise injection.<n> Experimental results on benchmark datasets demonstrate that FedEM significantly reduces privacy risks and preserves model accuracy, achieving a robust balance between privacy protection and utility preservation.
arXiv Detail & Related papers (2025-03-08T02:48:00Z)
Communication-Efficient and Privacy-Adaptable Mechanism for Federated Learning [54.20871516148981]
We introduce the Communication-Efficient and Privacy-Adaptable Mechanism (CEPAM)<n>CEPAM achieves communication efficiency and privacy protection simultaneously.<n>We theoretically analyze the privacy guarantee of CEPAM and investigate the trade-offs among user privacy and accuracy of CEPAM.
arXiv Detail & Related papers (2025-01-21T11:16:05Z)
How Breakable Is Privacy: Probing and Resisting Model Inversion Attacks in Collaborative Inference [13.453033795109155]
Collaborative inference improves computational efficiency for edge devices by transmitting intermediate features to cloud models.<n>There is no established criterion for assessing the difficulty of model inversion attacks (MIAs)<n>We propose the first theoretical criterion to assess MIA difficulty in CI, identifying mutual information, entropy, and effective information volume as key influencing factors.
arXiv Detail & Related papers (2025-01-01T13:00:01Z)
Privacy-Preserving Verifiable Neural Network Inference Service [4.131956503199438]
We develop a privacy-preserving and verifiable CNN inference scheme that preserves privacy for client data samples. vPIN achieves high efficiency in terms of proof size, while providing client data privacy guarantees and provable verifiability.
arXiv Detail & Related papers (2024-11-12T01:09:52Z)
ACCESS-FL: Agile Communication and Computation for Efficient Secure Aggregation in Stable Federated Learning Networks [26.002975401820887]
Federated Learning (FL) is a distributed learning framework designed for privacy-aware applications. Traditional FL approaches risk exposing sensitive client data when plain model updates are transmitted to the server. Google's Secure Aggregation (SecAgg) protocol addresses this threat by employing a double-masking technique. We propose ACCESS-FL, a communication-and-computation-efficient secure aggregation method.
arXiv Detail & Related papers (2024-09-03T09:03:38Z)
Privacy-Preserving, Dropout-Resilient Aggregation in Decentralized Learning [3.9166000694570076]
Decentralized learning (DL) offers a novel paradigm in machine learning by distributing training across clients without central aggregation. DL's peer-to-peer model raises challenges in protecting against inference attacks and privacy leaks. This work proposes three secret sharing-based dropout resilience approaches for privacy-preserving DL.
arXiv Detail & Related papers (2024-04-27T19:17:02Z)
G$^2$uardFL: Safeguarding Federated Learning Against Backdoor Attacks through Attributed Client Graph Clustering [116.4277292854053]
Federated Learning (FL) offers collaborative model training without data sharing. FL is vulnerable to backdoor attacks, where poisoned model weights lead to compromised system integrity. We present G$2$uardFL, a protective framework that reinterprets the identification of malicious clients as an attributed graph clustering problem.
arXiv Detail & Related papers (2023-06-08T07:15:04Z)
Over-the-Air Federated Learning with Privacy Protection via Correlated Additive Perturbations [57.20885629270732]
We consider privacy aspects of wireless federated learning with Over-the-Air (OtA) transmission of gradient updates from multiple users/agents to an edge server. Traditional perturbation-based methods provide privacy protection while sacrificing the training accuracy. In this work, we aim at minimizing privacy leakage to the adversary and the degradation of model accuracy at the edge server.
arXiv Detail & Related papers (2022-10-05T13:13:35Z)
Federated Learning with Unreliable Clients: Performance Analysis and Mechanism Design [76.29738151117583]
Federated Learning (FL) has become a promising tool for training effective machine learning models among distributed clients. However, low quality models could be uploaded to the aggregator server by unreliable clients, leading to a degradation or even a collapse of training. We model these unreliable behaviors of clients and propose a defensive mechanism to mitigate such a security risk.
arXiv Detail & Related papers (2021-05-10T08:02:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.