Practical Secure Inference Algorithm for Fine-tuned Large Language Model Based on Fully Homomorphic Encryption
- URL: http://arxiv.org/abs/2501.01672v2
- Date: Tue, 07 Jan 2025 05:36:41 GMT
- Title: Practical Secure Inference Algorithm for Fine-tuned Large Language Model Based on Fully Homomorphic Encryption
- Authors: Zhang Ruoyan, Zheng Zhongxiang, Bao Wankang,
- Abstract summary: We combine Fully Homomorphic Encryption(FHE) and provable security theory with Fine-Tuning(PEFT) to propose an efficient and secure inference scheme for large language models.
In this paper, we use the open-source model ChatGLM2-6B as the base model which is fine-tuned by LoRA.
Experimental results show the inference efficiency of our scheme reaches 1.61s/ which displays that the scheme has good practicality.
- Score: 0.0
- License:
- Abstract: Large language models(LLMs) are currently at the forefront of the machine learning field, which show a broad application prospect but at the same time expose some risks of privacy leakage. We combined Fully Homomorphic Encryption(FHE) and provable security theory with Parameter-Efficient Fine-Tuning(PEFT) to propose an efficient and secure inference scheme for LLMs. More specially, we focus on pre-trained LLMs which rely on open-sourced base model and then fine-tuned with the private datasets by LoRA. This is a popular road-map for Vertical Domain Models such as LawGPT and BenTsao. We use two key technologies below. Firstly, we divide the whole model into the public part and the private part. The weights of public part are publicly accessible(e.g. the open-sourced base model) while the private part needs to be protected(e.g. the LoRA matrices). In this way, the overhead brought by computing on private data can be greatly reduced. Secondly, we propose a general method to transform a linear layer into another one which provides security against model extraction attacks and preserves its original functionality, which denoted as Private Linear Layer(PLL). Then we use this method on the LoRA matrices to make sure that the server protects their private weights without restricting the user's input. We also show that the difficulty of performing model extraction attacks for PLL can be reduced to the well-known hard problem Learning with Errors(LWE). Combing this method with FHE, we can protect user's input at the same time. In this paper, we use the open-source model ChatGLM2-6B as the base model which is fine-tuned by LoRA. Experimental results show the inference efficiency of our scheme reaches 1.61s/token which displays that the scheme has good practicality.
Related papers
- Label Privacy in Split Learning for Large Models with Parameter-Efficient Training [51.28799334394279]
We search for a way to fine-tune models over an API while keeping the labels private.
We propose P$3$EFT, a multi-party split learning algorithm that takes advantage of existing PEFT properties to maintain privacy at a lower performance overhead.
arXiv Detail & Related papers (2024-12-21T15:32:03Z) - Encryption-Friendly LLM Architecture [11.386436468650016]
Homomorphic encryption (HE) is a cryptographic protocol supporting arithmetic computations in encrypted states.
We propose a modified HE-friendly transformer architecture with an emphasis on inference following personalized (private) fine-tuning.
arXiv Detail & Related papers (2024-10-03T13:48:35Z) - Lifelong Personalized Low-Rank Adaptation of Large Language Models for Recommendation [50.837277466987345]
We focus on the field of large language models (LLMs) for recommendation.
We propose RecLoRA, which incorporates a Personalized LoRA module that maintains independent LoRAs for different users.
We also design a Few2Many Learning Strategy, using a conventional recommendation model as a lens to magnify small training spaces to full spaces.
arXiv Detail & Related papers (2024-08-07T04:20:28Z) - ObfuscaTune: Obfuscated Offsite Fine-tuning and Inference of Proprietary LLMs on Private Datasets [8.483679748399037]
This work addresses the timely yet underexplored problem of performing inference and finetuning of a proprietary LLM owned by a model provider entity.
We propose ObfuscaTune, a novel, efficient and fully utility-preserving approach that combines a simple yet effective obfuscation technique with an efficient usage of confidential computing.
arXiv Detail & Related papers (2024-07-03T09:54:08Z) - Safe LoRA: the Silver Lining of Reducing Safety Risks when Fine-tuning Large Language Models [51.20476412037321]
We propose Safe LoRA, a simple one-liner patch to the original LoRA implementation by introducing the projection of LoRA weights from selected layers to the safety-aligned subspace.
Our experiments demonstrate that when fine-tuning on purely malicious data, Safe LoRA retains similar safety performance as the original aligned model.
arXiv Detail & Related papers (2024-05-27T05:04:05Z) - Continual Forgetting for Pre-trained Vision Models [70.51165239179052]
In real-world scenarios, selective information is expected to be continuously removed from a pre-trained model.
We propose Group Sparse LoRA (GS-LoRA) for efficient and effective deleting.
We conduct extensive experiments on face recognition, object detection and image classification and demonstrate that GS-LoRA manages to forget specific classes with minimal impact on other classes.
arXiv Detail & Related papers (2024-03-18T07:33:56Z) - Pandora's White-Box: Precise Training Data Detection and Extraction in Large Language Models [4.081098869497239]
We develop state-of-the-art privacy attacks against Large Language Models (LLMs)
New membership inference attacks (MIAs) against pretrained LLMs perform hundreds of times better than baseline attacks.
In fine-tuning, we find that a simple attack based on the ratio of the loss between the base and fine-tuned models is able to achieve near-perfect MIA performance.
arXiv Detail & Related papers (2024-02-26T20:41:50Z) - A Fast, Performant, Secure Distributed Training Framework For Large
Language Model [8.547104574876887]
We propose a secure distributed LLM based on model slicing.
We deploy the Trusted Execution Environment (TEE) on both the client and server side.
Secure communication is executed in the TEE and general environments through lightweight encryption.
arXiv Detail & Related papers (2024-01-18T08:33:09Z) - Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models [52.98743860365194]
We propose a new fine-tuning method called Self-Play fIne-tuNing (SPIN)
At the heart of SPIN lies a self-play mechanism, where the LLM refines its capability by playing against instances of itself.
This sheds light on the promise of self-play, enabling the achievement of human-level performance in LLMs without the need for expert opponents.
arXiv Detail & Related papers (2024-01-02T18:53:13Z) - Private, Efficient, and Accurate: Protecting Models Trained by
Multi-party Learning with Differential Privacy [8.8480262507008]
We propose PEA (Private, Efficient, Accurate), which consists of a secure DPSGD protocol and two optimization methods.
We implement PEA in two open-source MPL frameworks: TF-Encrypted and Queqiao.
Experiments show that PEA can train a differentially private classification model with an accuracy of 88% for CIFAR-10 within 7 minutes under the LAN setting.
arXiv Detail & Related papers (2022-08-18T06:48:25Z) - Just Fine-tune Twice: Selective Differential Privacy for Large Language
Models [69.66654761324702]
We propose a simple yet effective just-fine-tune-twice privacy mechanism to achieve SDP for large Transformer-based language models.
Experiments show that our models achieve strong performance while staying robust to the canary insertion attack.
arXiv Detail & Related papers (2022-04-15T22:36:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.