PrivateLoRA For Efficient Privacy Preserving LLM
- URL: http://arxiv.org/abs/2311.14030v1
- Date: Thu, 23 Nov 2023 14:36:30 GMT
- Title: PrivateLoRA For Efficient Privacy Preserving LLM
- Authors: Yiming Wang, Yu Lin, Xiaodong Zeng, Guannan Zhang
- Abstract summary: We propose a novel Large Language Model (LLM) service paradigm that distributes privacy-sensitive computation on edge devices and shared in the cloud.
Our core innovation, PrivateLoRA, addresses the challenging communication overhead by exploiting the low rank of residual activations.
Under standard 5G networks, PrivateLoRA achieves throughput over 300% of device-only solutions for 7B models and over 80% of an A100 GPU for 33B models.
- Score: 20.750808913757396
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: End users face a choice between privacy and efficiency in current Large
Language Model (LLM) service paradigms. In cloud-based paradigms, users are
forced to compromise data locality for generation quality and processing speed.
Conversely, edge device paradigms maintain data locality but fail to deliver
satisfactory performance. In this work, we propose a novel LLM service paradigm
that distributes privacy-sensitive computation on edge devices and shared
computation in the cloud. Only activations are transmitted between the central
cloud and edge devices to ensure data locality. Our core innovation,
PrivateLoRA, addresses the challenging communication overhead by exploiting the
low rank of residual activations, achieving over 95% communication reduction.
Consequently, PrivateLoRA effectively maintains data locality and is extremely
resource efficient. Under standard 5G networks, PrivateLoRA achieves throughput
over 300% of device-only solutions for 7B models and over 80% of an A100 GPU
for 33B models. PrivateLoRA also provides tuning performance comparable to LoRA
for advanced personalization. Our approach democratizes access to
state-of-the-art generative AI for edge devices, paving the way for more
tailored LLM experiences for the general public. To our knowledge, our proposed
framework is the first efficient and privacy-preserving LLM solution in the
literature.
Related papers
- PocketLLM: Enabling On-Device Fine-Tuning for Personalized LLMs [5.063806958859058]
On mobile devices, the wealth of valuable, non-public data generated daily holds great promise for locally fine-tuning personalized LLMs.
We propose employing derivative-free optimization techniques to enable on-device fine-tuning of LLM, even on memory-limited mobile devices.
Empirical results demonstrate that the RoBERTa-large model and OPT-1.3B can be fine-tuned locally on the OPPO Reno 6 smartphone using around 4GB and 6.5GB of memory respectively.
arXiv Detail & Related papers (2024-07-01T07:26:56Z) - FedBiOT: LLM Local Fine-tuning in Federated Learning without Full Model [48.33280660752336]
Large language models (LLMs) show amazing performance on many domain-specific tasks after fine-tuning with some appropriate data.
Many domain-specific data are privately distributed across multiple owners.
We introduce FedBiOT, a resource-efficient LLM fine-tuning approach to federated learning.
arXiv Detail & Related papers (2024-06-25T16:45:47Z) - PermLLM: Private Inference of Large Language Models within 3 Seconds under WAN [19.014325509263536]
ChatGPT marks the arrival of the large language model (LLM) era.
PermLLM achieves two-party private inference of the ChatGLM-6B model at the speed of around 3s/token.
arXiv Detail & Related papers (2024-05-29T04:06:50Z) - DLoRA: Distributed Parameter-Efficient Fine-Tuning Solution for Large Language Model [17.688874383440208]
We propose a distributed parameter-efficient fine-tuning framework called DLoRA.
DLoRA enables scalable PEFT operations to be performed collaboratively between the cloud and user devices.
We show that DLoRA can significantly reduce the computation and communication workload over the user devices while achieving superior accuracy and privacy protection.
arXiv Detail & Related papers (2024-04-08T04:14:02Z) - MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT [87.4910758026772]
"Bigger the better" has been the predominant trend in recent Large Language Models (LLMs) development.
This paper explores the "less is more" paradigm by addressing the challenge of designing accurate yet efficient Small Language Models (SLMs) for resource constrained devices.
arXiv Detail & Related papers (2024-02-26T18:59:03Z) - Free Lunch for Federated Remote Sensing Target Fine-Grained
Classification: A Parameter-Efficient Framework [23.933367972846312]
This paper proposes a novel Privacy-Reserving TFGC Framework based on Federated Learning, dubbed PRFL.
We demonstrate the effectiveness of the proposed PRFL on the classical TFGC task by leveraging four public datasets.
arXiv Detail & Related papers (2024-01-03T01:45:00Z) - Democratizing LLMs: An Exploration of Cost-Performance Trade-offs in
Self-Refined Open-Source Models [53.859446823312126]
SoTA open source models of varying sizes from 7B - 65B, on average, improve 8.2% from their baseline performance.
Strikingly, even models with extremely small memory footprints, such as Vicuna-7B, show a 11.74% improvement overall and up to a 25.39% improvement in high-creativity, open ended tasks.
arXiv Detail & Related papers (2023-10-11T15:56:00Z) - Efficient Federated Prompt Tuning for Black-box Large Pre-trained Models [62.838689691468666]
We propose Federated Black-Box Prompt Tuning (Fed-BBPT) to optimally harness each local dataset.
Fed-BBPT capitalizes on a central server that aids local users in collaboratively training a prompt generator through regular aggregation.
Relative to extensive fine-tuning, Fed-BBPT proficiently sidesteps memory challenges tied to PTM storage and fine-tuning on local machines.
arXiv Detail & Related papers (2023-10-04T19:30:49Z) - FusionAI: Decentralized Training and Deploying LLMs with Massive
Consumer-Level GPUs [57.12856172329322]
We envision a decentralized system unlocking the potential vast untapped consumer-level GPU.
This system faces critical challenges, including limited CPU and GPU memory, low network bandwidth, the variability of peer and device heterogeneity.
arXiv Detail & Related papers (2023-09-03T13:27:56Z) - Theoretically Principled Federated Learning for Balancing Privacy and
Utility [61.03993520243198]
We propose a general learning framework for the protection mechanisms that protects privacy via distorting model parameters.
It can achieve personalized utility-privacy trade-off for each model parameter, on each client, at each communication round in federated learning.
arXiv Detail & Related papers (2023-05-24T13:44:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.