Related papers: Accelerating Private Large Transformers Inference through Fine-grained Collaborative Computation

Accelerating Private Large Transformers Inference through Fine-grained Collaborative Computation

URL: http://arxiv.org/abs/2412.16537v2
Date: Sat, 25 Jan 2025 14:10:34 GMT
Title: Accelerating Private Large Transformers Inference through Fine-grained Collaborative Computation
Authors: Yuntian Chen, Zhanyong Tang, Tianpei Lu, Bingsheng Zhang, Zhiying Shi, Zheng Wang,
Abstract summary: We present FASTLMPI, a new approach to accelerate private TBM inference through fine-grained optimization.<n>Specifically, through the fine-grained co-design of homomorphic encryption and secret sharing, FASTLMPI achieves efficient protocols for matrix multiplication, SoftMax, LayerNorm, and GeLULU.<n>FASTLMPI shows a remarkable 54% to 64% decrease in runtime and an impressive 72.2% reduction in communication costs.
Score: 8.859237832459876
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Homomorphic encryption (HE) and secret sharing (SS) enable computations on encrypted data, providing significant privacy benefits for large transformer-based models (TBM) in sensitive sectors like medicine and finance. However, private TBM inference incurs significant costs due to the coarse-grained application of HE and SS. We present FASTLMPI, a new approach to accelerate private TBM inference through fine-grained computation optimization. Specifically, through the fine-grained co-design of homomorphic encryption and secret sharing, FASTLMPI achieves efficient protocols for matrix multiplication, SoftMax, LayerNorm, and GeLU. In addition, FASTLMPI introduces a precise segmented approximation technique for differentiable non-linear, improving its fitting accuracy while maintaining a low polynomial degree. Compared to solution BOLT (S&P'24), FASTLMPI shows a remarkable 54% to 64% decrease in runtime and an impressive 72.2% reduction in communication costs.

Related papers

Privacy-Preserving Inference for Quantized BERT Models [13.36359444231145]
Quantization offers a promising solution by converting floating-point operations into lower-precision integer computations.<n>We propose a fine-grained, layer-wise quantization scheme and support 1-bit weight fully connected layers in a secure setting.
arXiv Detail & Related papers (2025-08-03T07:52:08Z)
PriFFT: Privacy-preserving Federated Fine-tuning of Large Language Models via Hybrid Secret Sharing [20.148411915688175]
Fine-tuning large language models (LLMs) raises privacy concerns due to the risk of exposing sensitive training data.<n>Recent studies show that adversaries can still infer private information in FL.<n>We propose PriFFT, a privacy-preserving federated fine-tuning mechanism.
arXiv Detail & Related papers (2025-03-05T03:41:57Z)
Theoretical Insights in Model Inversion Robustness and Conditional Entropy Maximization for Collaborative Inference Systems [89.35169042718739]
collaborative inference enables end users to leverage powerful deep learning models without exposure of sensitive raw data to cloud servers. Recent studies have revealed that these intermediate features may not sufficiently preserve privacy, as information can be leaked and raw data can be reconstructed via model inversion attacks (MIAs) This work first theoretically proves that the conditional entropy of inputs given intermediate features provides a guaranteed lower bound on the reconstruction mean square error (MSE) under any MIA. Then, we derive a differentiable and solvable measure for bounding this conditional entropy based on the Gaussian mixture estimation and propose a conditional entropy algorithm to enhance the inversion robustness
arXiv Detail & Related papers (2025-03-01T07:15:21Z)
Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs [74.74225314708225]
Multi-head Latent Attention (MLA) is an innovative architecture designed to ensure efficient and economical inference. This paper proposes the first data-efficient fine-tuning method for transitioning from Multi-Head Attention to MLA.
arXiv Detail & Related papers (2025-02-20T18:50:42Z)
Progressive Mixed-Precision Decoding for Efficient LLM Inference [49.05448842542558]
We introduce Progressive Mixed-Precision Decoding (PMPD) to address the memory-boundedness of decoding. PMPD achieves 1.4$-$12.2$times$ speedup in matrix-vector multiplications over fp16 models. Our approach delivers a throughput gain of 3.8$-$8.0$times$ over fp16 models and up to 1.54$times$ over uniform quantization approaches.
arXiv Detail & Related papers (2024-10-17T11:46:33Z)
Improved Communication-Privacy Trade-offs in $L_2$ Mean Estimation under Streaming Differential Privacy [47.997934291881414]
Existing mean estimation schemes are usually optimized for $L_infty$ geometry and rely on random rotation or Kashin's representation to adapt to $L$ geometry. We introduce a novel privacy accounting method for the sparsified Gaussian mechanism that incorporates the randomness inherent in sparsification into the DP. Unlike previous approaches, our accounting algorithm directly operates in $L$ geometry, yielding MSEs that fast converge to those of the Gaussian mechanism.
arXiv Detail & Related papers (2024-05-02T03:48:47Z)
Batch-oriented Element-wise Approximate Activation for Privacy-Preserving Neural Networks [5.039738753594332]
Homomorphic Encryption (FHE) is facing a great challenge that homomorphic operations cannot be easily adapted for non-linear activation calculations. Batch-oriented element-wise data packing and approximate activation are proposed, which train linear low-degrees to approximate the non-linear activation function - ReLU. Experiment results show that when ciphertext inference is performed on 4096 input images, compared with the current most efficient channel-wise method, the inference accuracy is improved by 1.65%, and the amortized inference time is reduced by 99.5%.
arXiv Detail & Related papers (2024-03-16T13:26:33Z)
SecFormer: Fast and Accurate Privacy-Preserving Inference for Transformer Models via SMPC [34.63351580241698]
We introduce a comprehensive PPI framework called SecFormer to achieve fast and accurate PPI for Transformer models.<n>In terms of efficiency, SecFormer is 3.57 and 3.58 times faster than PUMA for BERT$_textBASE$ and BERT$_textLARGE$, demonstrating its effectiveness and speed.
arXiv Detail & Related papers (2024-01-01T15:40:35Z)
Practical Privacy-Preserving Gaussian Process Regression via Secret Sharing [23.80837224347696]
This paper proposes a privacy-preserving GPR method based on secret sharing (SS) We derive a new SS-based exponentiation operation through the idea of 'confusion-correction' and construct an SS-based matrix inversion algorithm based on Cholesky decomposition. Empirical results show that our proposed method can achieve reasonable accuracy and efficiency under the premise of preserving data privacy.
arXiv Detail & Related papers (2023-06-26T08:17:51Z)
Breaking the Communication-Privacy-Accuracy Tradeoff with $f$-Differential Privacy [51.11280118806893]
We consider a federated data analytics problem in which a server coordinates the collaborative data analysis of multiple users with privacy concerns and limited communication capability. We study the local differential privacy guarantees of discrete-valued mechanisms with finite output space through the lens of $f$-differential privacy (DP) More specifically, we advance the existing literature by deriving tight $f$-DP guarantees for a variety of discrete-valued mechanisms.
arXiv Detail & Related papers (2023-02-19T16:58:53Z)
THE-X: Privacy-Preserving Transformer Inference with Homomorphic Encryption [112.02441503951297]
Privacy-preserving inference of transformer models is on the demand of cloud service users. We introduce $textitTHE-X$, an approximation approach for transformers, which enables privacy-preserving inference of pre-trained models.
arXiv Detail & Related papers (2022-06-01T03:49:18Z)
Local Differential Privacy for Bayesian Optimization [12.05395706770007]
We consider a black-box optimization in the nonparametric Gaussian process setting with local differential privacy (LDP) guarantee. Specifically, the rewards from each user are further corrupted to protect privacy and the learner only has access to the corrupted rewards to minimize the regret. We present three almost optimal algorithms based on the GP-UCB framework and Laplace DP mechanism.
arXiv Detail & Related papers (2020-10-13T21:50:09Z)
Differentially Private Federated Learning with Laplacian Smoothing [72.85272874099644]
Federated learning aims to protect data privacy by collaboratively learning a model without sharing private data among users. An adversary may still be able to infer the private training data by attacking the released model. Differential privacy provides a statistical protection against such attacks at the price of significantly degrading the accuracy or utility of the trained models.
arXiv Detail & Related papers (2020-05-01T04:28:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.