Private LoRA Fine-tuning of Open-Source LLMs with Homomorphic Encryption
- URL: http://arxiv.org/abs/2505.07329v1
- Date: Mon, 12 May 2025 08:14:33 GMT
- Title: Private LoRA Fine-tuning of Open-Source LLMs with Homomorphic Encryption
- Authors: Jordan Frery, Roman Bredehoft, Jakub Klemsa, Arthur Meyre, Andrei Stoian,
- Abstract summary: Homomorphic Encryption (HE) protects the confidentiality of training data.<n>This work introduces an interactive protocol adapting the Low-Rank Adaptation (LoRA) technique for private fine-tuning.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Preserving data confidentiality during the fine-tuning of open-source Large Language Models (LLMs) is crucial for sensitive applications. This work introduces an interactive protocol adapting the Low-Rank Adaptation (LoRA) technique for private fine-tuning. Homomorphic Encryption (HE) protects the confidentiality of training data and gradients handled by remote worker nodes performing the bulk of computations involving the base model weights. The data owner orchestrates training, requiring minimal local computing power and memory, thus alleviating the need for expensive client-side GPUs. We demonstrate feasibility by fine-tuning a Llama-3.2-1B model, presenting convergence results using HE-compatible quantization and performance benchmarks for HE computations on GPU hardware. This approach enables applications such as confidential knowledge base question answering, private codebase fine-tuning for AI code assistants, AI agents for drafting emails based on a company's email archive, and adapting models to analyze sensitive legal or healthcare documents.
Related papers
- HE-LRM: Encrypted Deep Learning Recommendation Models using Fully Homomorphic Encryption [3.0841649700901117]
Fully Homomorphic Encryption (FHE) is an encryption scheme that not only encrypts data but also allows for computations to be applied directly on the encrypted data.<n>In this paper, we explore the challenges and opportunities when applying FHE to Deep Learning Recommendation Models (DLRM)<n>We develop novel methods for performing compressed embedding lookups in order to reduce FHE computational costs while keeping the underlying model performant.
arXiv Detail & Related papers (2025-06-22T19:40:04Z) - Secure Distributed Learning for CAVs: Defending Against Gradient Leakage with Leveled Homomorphic Encryption [0.0]
Homomorphic Encryption (HE) offers a promising alternative to Differential Privacy (DP) and Secure Multi-Party Computation (SMPC)<n>We evaluate various HE schemes to identify the most suitable for Federated Learning (FL) in resource-constrained environments.<n>We develop a full HE-based FL pipeline that effectively mitigates Deep Leakage from Gradients (DLG) attacks while preserving model accuracy.
arXiv Detail & Related papers (2025-06-09T16:12:18Z) - FedShield-LLM: A Secure and Scalable Federated Fine-Tuned Large Language Model [0.48342038441006796]
Federated Learning (FL) offers a decentralized framework for training and fine-tuning Large Language Models (LLMs)<n>FL addresses privacy and security concerns while navigating challenges associated with the substantial computational demands of LLMs.<n>We propose a novel method, FedShield-LLM, that uses pruning with Fully Homomorphic Encryption (FHE) for Low-Rank Adaptation (LoRA) parameters.
arXiv Detail & Related papers (2025-06-06T00:05:05Z) - Is Compression Really Linear with Code Intelligence? [60.123628177110206]
textitFormat Annealing is a lightweight, transparent training methodology designed to assess the intrinsic capabilities of pre-trained models equitably.<n>Our empirical results reveal a fundamental logarithmic relationship between measured code intelligence and bits-per-character (BPC)<n>Our work provides a more nuanced understanding of compression's role in developing code intelligence and contributes a robust evaluation framework in the code domain.
arXiv Detail & Related papers (2025-05-16T16:59:14Z) - Practical Secure Inference Algorithm for Fine-tuned Large Language Model Based on Fully Homomorphic Encryption [0.0]
We combine Fully Homomorphic Encryption(FHE) and provable security theory with Fine-Tuning(PEFT) to propose an efficient and secure inference scheme for large language models.<n>In this paper, we use the open-source model ChatGLM2-6B as the base model which is fine-tuned by LoRA.<n> Experimental results show the inference efficiency of our scheme reaches 1.61s/ which displays that the scheme has good practicality.
arXiv Detail & Related papers (2025-01-03T07:19:23Z) - OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models [76.59316249991657]
Large language models (LLMs) for code have become indispensable in various domains, including code generation, reasoning tasks and agent systems.<n>While open-access code LLMs are increasingly approaching the performance levels of proprietary models, high-quality code LLMs remain limited.<n>We introduce OpenCoder, a top-tier code LLM that not only achieves performance comparable to leading models but also serves as an "open cookbook" for the research community.
arXiv Detail & Related papers (2024-11-07T17:47:25Z) - Encryption-Friendly LLM Architecture [11.386436468650016]
Homomorphic encryption (HE) is a cryptographic protocol supporting arithmetic computations in encrypted states.<n>We propose a modified HE-friendly transformer architecture with an emphasis on inference following personalized (private) fine-tuning.
arXiv Detail & Related papers (2024-10-03T13:48:35Z) - Robust Utility-Preserving Text Anonymization Based on Large Language Models [80.5266278002083]
Text anonymization is crucial for sharing sensitive data while maintaining privacy.
Existing techniques face the emerging challenges of re-identification attack ability of Large Language Models.
This paper proposes a framework composed of three LLM-based components -- a privacy evaluator, a utility evaluator, and an optimization component.
arXiv Detail & Related papers (2024-07-16T14:28:56Z) - PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind)
Our work offers a theoretical analysis for model design and benchmarks various techniques.
In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z) - When approximate design for fast homomorphic computation provides
differential privacy guarantees [0.08399688944263842]
Differential privacy (DP) and cryptographic primitives are popular countermeasures against privacy attacks.
In this paper, we design SHIELD, a probabilistic approximation algorithm for the argmax operator.
Even if SHIELD could have other applications, we here focus on one setting and seamlessly integrate it in the SPEED collaborative training framework.
arXiv Detail & Related papers (2023-04-06T09:38:01Z) - PEOPL: Characterizing Privately Encoded Open Datasets with Public Labels [59.66777287810985]
We introduce information-theoretic scores for privacy and utility, which quantify the average performance of an unfaithful user.
We then theoretically characterize primitives in building families of encoding schemes that motivate the use of random deep neural networks.
arXiv Detail & Related papers (2023-03-31T18:03:53Z) - Q-LSTM Language Model -- Decentralized Quantum Multilingual Pre-Trained
Language Model for Privacy Protection [6.0038761646405225]
Large-scale language models are trained on a massive amount of natural language data that might encode or reflect our private information.
malicious agents can reverse engineer the training data even if data sanitation and differential privacy algorithms were involved in the pre-training process.
We propose a decentralized training framework to address privacy concerns in training large-scale language models.
arXiv Detail & Related papers (2022-10-06T21:29:17Z) - Faster Secure Data Mining via Distributed Homomorphic Encryption [108.77460689459247]
Homomorphic Encryption (HE) is receiving more and more attention recently for its capability to do computations over the encrypted field.
We propose a novel general distributed HE-based data mining framework towards one step of solving the scaling problem.
We verify the efficiency and effectiveness of our new framework by testing over various data mining algorithms and benchmark data-sets.
arXiv Detail & Related papers (2020-06-17T18:14:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.