PrivateLoRA For Efficient Privacy Preserving LLM
- URL: http://arxiv.org/abs/2311.14030v1
- Date: Thu, 23 Nov 2023 14:36:30 GMT
- Title: PrivateLoRA For Efficient Privacy Preserving LLM
- Authors: Yiming Wang, Yu Lin, Xiaodong Zeng, Guannan Zhang
- Abstract summary: We propose a novel Large Language Model (LLM) service paradigm that distributes privacy-sensitive computation on edge devices and shared in the cloud.
Our core innovation, PrivateLoRA, addresses the challenging communication overhead by exploiting the low rank of residual activations.
Under standard 5G networks, PrivateLoRA achieves throughput over 300% of device-only solutions for 7B models and over 80% of an A100 GPU for 33B models.
- Score: 20.750808913757396
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: End users face a choice between privacy and efficiency in current Large
Language Model (LLM) service paradigms. In cloud-based paradigms, users are
forced to compromise data locality for generation quality and processing speed.
Conversely, edge device paradigms maintain data locality but fail to deliver
satisfactory performance. In this work, we propose a novel LLM service paradigm
that distributes privacy-sensitive computation on edge devices and shared
computation in the cloud. Only activations are transmitted between the central
cloud and edge devices to ensure data locality. Our core innovation,
PrivateLoRA, addresses the challenging communication overhead by exploiting the
low rank of residual activations, achieving over 95% communication reduction.
Consequently, PrivateLoRA effectively maintains data locality and is extremely
resource efficient. Under standard 5G networks, PrivateLoRA achieves throughput
over 300% of device-only solutions for 7B models and over 80% of an A100 GPU
for 33B models. PrivateLoRA also provides tuning performance comparable to LoRA
for advanced personalization. Our approach democratizes access to
state-of-the-art generative AI for edge devices, paving the way for more
tailored LLM experiences for the general public. To our knowledge, our proposed
framework is the first efficient and privacy-preserving LLM solution in the
literature.
Related papers
- CoSteer: Collaborative Decoding-Time Personalization via Local Delta Steering [68.91862701376155]
CoSteer is a novel collaborative framework that enables decoding-time personalization through localized delta steering.<n>We formulate token-level optimization as an online learning problem, where local delta vectors dynamically adjust the remote LLM's logits.<n>This approach preserves privacy by transmitting only the final steered tokens rather than raw data or intermediate vectors.
arXiv Detail & Related papers (2025-07-07T08:32:29Z) - FedShield-LLM: A Secure and Scalable Federated Fine-Tuned Large Language Model [0.48342038441006796]
Federated Learning (FL) offers a decentralized framework for training and fine-tuning Large Language Models (LLMs)<n>FL addresses privacy and security concerns while navigating challenges associated with the substantial computational demands of LLMs.<n>We propose a novel method, FedShield-LLM, that uses pruning with Fully Homomorphic Encryption (FHE) for Low-Rank Adaptation (LoRA) parameters.
arXiv Detail & Related papers (2025-06-06T00:05:05Z) - PC-MoE: Memory-Efficient and Privacy-Preserving Collaborative Training for Mixture-of-Experts LLMs [56.04036826558497]
We introduce Privacy-preserving Collaborative Mixture-of-Experts (PC-MoE)<n>By design, PC-MoE synergistically combines the strengths of distributed computation with strong confidentiality assurances.<n>It almost matches (and sometimes exceeds) the performance and convergence rate of a fully centralized model, enjoys near 70% peak GPU RAM reduction, while being fully robust against reconstruction attacks.
arXiv Detail & Related papers (2025-06-03T15:00:18Z) - PWC-MoE: Privacy-Aware Wireless Collaborative Mixture of Experts [59.5243730853157]
Large language models (LLMs) hosted on cloud servers alleviate the computational and storage burdens on local devices but raise privacy concerns.<n>Small language models (SLMs) running locally enhance privacy but suffer from limited performance on complex tasks.<n>We propose a privacy-aware wireless collaborative mixture of experts (PWC-MoE) framework to balance computational cost, performance, and privacy protection under bandwidth constraints.
arXiv Detail & Related papers (2025-05-13T16:27:07Z) - LSRP: A Leader-Subordinate Retrieval Framework for Privacy-Preserving Cloud-Device Collaboration [43.115594451678255]
Cloud-device collaboration leverages on-cloud Large Language Models (LLMs) for handling public user queries and on-device Small Language Models (SLMs) for processing private user data.<n>Existing approaches often fail to fully leverage the scalable problem-solving capabilities of on-cloud LLMs.<n>We propose a Leader-Subordinate Retrieval framework for Privacy-preserving cloud-device collaboration (LSRP)
arXiv Detail & Related papers (2025-05-08T08:06:34Z) - Prada: Black-Box LLM Adaptation with Private Data on Resource-Constrained Devices [16.500721672193762]
Large Language Models (LLMs) can be adapted to specialized domains using private datasets stored on resource-constrained edge devices.
We propose Prada, a privacy-preserving and efficient black-box LLM adaptation system using private on-device datasets.
Prada achieves performance comparable to centralized fine-tuning methods while significantly reducing computational overhead by up to 60% and communication costs by up to 80%.
arXiv Detail & Related papers (2025-03-19T06:38:51Z) - A Hybrid Swarm Intelligence Approach for Optimizing Multimodal Large Language Models Deployment in Edge-Cloud-based Federated Learning Environments [10.72166883797356]
Federated Learning (FL), Multimodal Large Language Models (MLLMs), and edge-cloud computing enables distributed and real-time data processing.
We propose a novel hybrid framework wherein MLLMs are deployed on edge devices equipped with sufficient resources and battery life, while the majority of training occurs in the cloud.
Our experimental results show that the proposed method significantly improves system performance, achieving an accuracy of 92%, reducing communication cost by 30%, and enhancing client participation.
arXiv Detail & Related papers (2025-02-04T03:03:24Z) - TinyML NLP Approach for Semantic Wireless Sentiment Classification [49.801175302937246]
We introduce split learning (SL) as an energy-efficient alternative, privacy-preserving tiny machine learning (MLTiny) scheme.
Our results show that SL reduces processing power and CO2 emissions while maintaining high accuracy, whereas FL offers a balanced compromise between efficiency and privacy.
arXiv Detail & Related papers (2024-11-09T21:26:59Z) - CE-CoLLM: Efficient and Adaptive Large Language Models Through Cloud-Edge Collaboration [1.6021932740447968]
Large Language Models (LLMs) have achieved remarkable success in serving end-users with human-like intelligence.
LLMs demand high computational resources, making it challenging to deploy them to satisfy various performance objectives.
We introduce CE-CoLLM, a novel cloud-edge collaboration framework that supports efficient and adaptive LLM inference for end-users at the edge.
arXiv Detail & Related papers (2024-11-05T06:00:27Z) - Pluto and Charon: A Time and Memory Efficient Collaborative Edge AI Framework for Personal LLMs Fine-Tuning [13.26886445965894]
Pluto and Charon (PAC) is a time and memory efficient collaborative edge AI framework for personal LLMs fine-tuning.
PAC implements a personal LLMs fine-tuning technique that is efficient in terms of parameters, time, and memory.
Extensive evaluation based on prototype implementation demonstrates that PAC remarkably outperforms state-of-the-art approaches.
arXiv Detail & Related papers (2024-08-20T11:30:12Z) - Lifelong Personalized Low-Rank Adaptation of Large Language Models for Recommendation [50.837277466987345]
We focus on the field of large language models (LLMs) for recommendation.
We propose RecLoRA, which incorporates a Personalized LoRA module that maintains independent LoRAs for different users.
We also design a Few2Many Learning Strategy, using a conventional recommendation model as a lens to magnify small training spaces to full spaces.
arXiv Detail & Related papers (2024-08-07T04:20:28Z) - Mobile Edge Intelligence for Large Language Models: A Contemporary Survey [32.22789677882933]
Mobile edge intelligence (MEI) provides AI capabilities within the edge of mobile networks with improved privacy and latency relative to cloud computing.
MEI sits between on-device AI and cloud-based AI, featuring wireless communications and more powerful computing resources than end devices.
This article provides a contemporary survey on harnessing MEI for LLMs.
arXiv Detail & Related papers (2024-07-09T13:47:05Z) - PermLLM: Private Inference of Large Language Models within 3 Seconds under WAN [19.014325509263536]
ChatGPT marks the arrival of the large language model (LLM) era.
PermLLM achieves two-party private inference of the ChatGLM-6B model at the speed of around 3s/token.
arXiv Detail & Related papers (2024-05-29T04:06:50Z) - DLoRA: Distributed Parameter-Efficient Fine-Tuning Solution for Large Language Model [17.688874383440208]
We propose a distributed parameter-efficient fine-tuning framework called DLoRA.
DLoRA enables scalable PEFT operations to be performed collaboratively between the cloud and user devices.
We show that DLoRA can significantly reduce the computation and communication workload over the user devices while achieving superior accuracy and privacy protection.
arXiv Detail & Related papers (2024-04-08T04:14:02Z) - MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT [87.4910758026772]
"Bigger the better" has been the predominant trend in recent Large Language Models (LLMs) development.
This paper explores the "less is more" paradigm by addressing the challenge of designing accurate yet efficient Small Language Models (SLMs) for resource constrained devices.
arXiv Detail & Related papers (2024-02-26T18:59:03Z) - Democratizing LLMs: An Exploration of Cost-Performance Trade-offs in
Self-Refined Open-Source Models [53.859446823312126]
SoTA open source models of varying sizes from 7B - 65B, on average, improve 8.2% from their baseline performance.
Strikingly, even models with extremely small memory footprints, such as Vicuna-7B, show a 11.74% improvement overall and up to a 25.39% improvement in high-creativity, open ended tasks.
arXiv Detail & Related papers (2023-10-11T15:56:00Z) - FusionAI: Decentralized Training and Deploying LLMs with Massive
Consumer-Level GPUs [57.12856172329322]
We envision a decentralized system unlocking the potential vast untapped consumer-level GPU.
This system faces critical challenges, including limited CPU and GPU memory, low network bandwidth, the variability of peer and device heterogeneity.
arXiv Detail & Related papers (2023-09-03T13:27:56Z) - Theoretically Principled Federated Learning for Balancing Privacy and
Utility [61.03993520243198]
We propose a general learning framework for the protection mechanisms that protects privacy via distorting model parameters.
It can achieve personalized utility-privacy trade-off for each model parameter, on each client, at each communication round in federated learning.
arXiv Detail & Related papers (2023-05-24T13:44:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.