Prada: Black-Box LLM Adaptation with Private Data on Resource-Constrained Devices
- URL: http://arxiv.org/abs/2503.14932v1
- Date: Wed, 19 Mar 2025 06:38:51 GMT
- Title: Prada: Black-Box LLM Adaptation with Private Data on Resource-Constrained Devices
- Authors: Ziyao Wang, Yexiao He, Zheyu Shen, Yu Li, Guoheng Sun, Myungjin Lee, Ang Li,
- Abstract summary: Large Language Models (LLMs) can be adapted to specialized domains using private datasets stored on resource-constrained edge devices.<n>We propose Prada, a privacy-preserving and efficient black-box LLM adaptation system using private on-device datasets.<n>Prada achieves performance comparable to centralized fine-tuning methods while significantly reducing computational overhead by up to 60% and communication costs by up to 80%.
- Score: 16.500721672193762
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, Large Language Models (LLMs) have demonstrated remarkable abilities in various natural language processing tasks. However, adapting these models to specialized domains using private datasets stored on resource-constrained edge devices, such as smartphones and personal computers, remains challenging due to significant privacy concerns and limited computational resources. Existing model adaptation methods either compromise data privacy by requiring data transmission or jeopardize model privacy by exposing proprietary LLM parameters. To address these challenges, we propose Prada, a novel privacy-preserving and efficient black-box LLM adaptation system using private on-device datasets. Prada employs a lightweight proxy model fine-tuned with Low-Rank Adaptation (LoRA) locally on user devices. During inference, Prada leverages the logits offset, i.e., difference in outputs between the base and adapted proxy models, to iteratively refine outputs from a remote black-box LLM. This offset-based adaptation approach preserves both data privacy and model privacy, as there is no need to share sensitive data or proprietary model parameters. Furthermore, we incorporate speculative decoding to further speed up the inference process of Prada, making the system practically deployable on bandwidth-constrained edge devices, enabling a more practical deployment of Prada. Extensive experiments on various downstream tasks demonstrate that Prada achieves performance comparable to centralized fine-tuning methods while significantly reducing computational overhead by up to 60% and communication costs by up to 80%.
Related papers
- CoSteer: Collaborative Decoding-Time Personalization via Local Delta Steering [68.91862701376155]
CoSteer is a novel collaborative framework that enables decoding-time personalization through localized delta steering.<n>We formulate token-level optimization as an online learning problem, where local delta vectors dynamically adjust the remote LLM's logits.<n>This approach preserves privacy by transmitting only the final steered tokens rather than raw data or intermediate vectors.
arXiv Detail & Related papers (2025-07-07T08:32:29Z) - Never Start from Scratch: Expediting On-Device LLM Personalization via Explainable Model Selection [5.174560360759384]
Personalization of Large Language Models (LLMs) is important in practical applications to accommodate the individual needs of different mobile users.
We present XPerT, a new technique that ensure proper selection of such already personalized LLMs based on explainability about how they were being fine-tuned.
Experiment results show that XPerT reduces the costs of on-device LLM personalization by 83%, and improves its data efficiency by 51%.
arXiv Detail & Related papers (2025-04-15T17:38:06Z) - PLM: Efficient Peripheral Language Models Hardware-Co-Designed for Ubiquitous Computing [48.30406812516552]
We introduce the PLM, a Peripheral Language Model, developed through a co-design process that jointly optimize model architecture and edge system constraints.<n>PLM employs a Multi-head Latent Attention mechanism and employs the squared ReLU activation function to encourage sparsity, thereby reducing peak memory footprint.<n> evaluation results demonstrate that PLM outperforms existing small language models trained on publicly available data.
arXiv Detail & Related papers (2025-03-15T15:11:17Z) - Split Adaptation for Pre-trained Vision Transformers [17.25680016753443]
We propose a novel split adaptation (SA) method that enables effective downstream adaptation while protecting data and models.
Unlike regular SL, SA replaces parameters with low-bit quantized values, preventing direct exposure of the model.
SA incorporates data-level and model-level out-of-distribution enhancements to mitigate noise injection's impact on adaptation performance.
arXiv Detail & Related papers (2025-03-01T10:38:53Z) - LLM-Lasso: A Robust Framework for Domain-Informed Feature Selection and Regularization [59.75242204923353]
We introduce LLM-Lasso, a framework that leverages large language models (LLMs) to guide feature selection in Lasso regression.<n>LLMs generate penalty factors for each feature, which are converted into weights for the Lasso penalty using a simple, tunable model.<n>Features identified as more relevant by the LLM receive lower penalties, increasing their likelihood of being retained in the final model.
arXiv Detail & Related papers (2025-02-15T02:55:22Z) - ScaleOT: Privacy-utility-scalable Offsite-tuning with Dynamic LayerReplace and Selective Rank Compression [13.702472186412296]
Offsite-tuning is a privacy-preserving method for tuning large language models.<n>We propose ScaleOT, a novel privacy-utility-scalable offsite-tuning framework.
arXiv Detail & Related papers (2024-12-13T03:00:48Z) - Model-based Large Language Model Customization as Service [34.949731264918846]
Large Language Model (LLM) services from providers like OpenAI and Google excel at general tasks but often underperform on domain-specific applications.<n>We introduce Llamdex, a novel framework that facilitates LLM customization as a service, where the client uploads pre-trained domain-specific models rather than data.<n> Experiments demonstrate that Llamdex improves domain-specific accuracy by up to 26% over state-of-the-art private data synthesis methods under identical privacy constraints.
arXiv Detail & Related papers (2024-10-14T13:18:20Z) - Self-Augmented Preference Optimization: Off-Policy Paradigms for Language Model Alignment [104.18002641195442]
We introduce Self-Augmented Preference Optimization (SAPO), an effective and scalable training paradigm that does not require existing paired data.
Building on the self-play concept, which autonomously generates negative responses, we further incorporate an off-policy learning pipeline to enhance data exploration and exploitation.
arXiv Detail & Related papers (2024-05-31T14:21:04Z) - MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT [87.4910758026772]
"Bigger the better" has been the predominant trend in recent Large Language Models (LLMs) development.
This paper explores the "less is more" paradigm by addressing the challenge of designing accurate yet efficient Small Language Models (SLMs) for resource constrained devices.
arXiv Detail & Related papers (2024-02-26T18:59:03Z) - Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models [52.98743860365194]
We propose a new fine-tuning method called Self-Play fIne-tuNing (SPIN)
At the heart of SPIN lies a self-play mechanism, where the LLM refines its capability by playing against instances of itself.
This sheds light on the promise of self-play, enabling the achievement of human-level performance in LLMs without the need for expert opponents.
arXiv Detail & Related papers (2024-01-02T18:53:13Z) - PrivateLoRA For Efficient Privacy Preserving LLM [20.750808913757396]
We propose a novel Large Language Model (LLM) service paradigm that distributes privacy-sensitive computation on edge devices and shared in the cloud.
Our core innovation, PrivateLoRA, addresses the challenging communication overhead by exploiting the low rank of residual activations.
Under standard 5G networks, PrivateLoRA achieves throughput over 300% of device-only solutions for 7B models and over 80% of an A100 GPU for 33B models.
arXiv Detail & Related papers (2023-11-23T14:36:30Z) - Split-and-Denoise: Protect large language model inference with local differential privacy [2.572566198588905]
Split-N-Denoise (SnD) is a private inference framework that splits the model to execute the token embedding layer on the client side at minimal computational cost.
We show SnD's effectiveness in optimizing the privacy-utility tradeoff across various LLM architectures and diverse downstream tasks.
arXiv Detail & Related papers (2023-10-13T14:17:33Z) - Just Fine-tune Twice: Selective Differential Privacy for Large Language
Models [69.66654761324702]
We propose a simple yet effective just-fine-tune-twice privacy mechanism to achieve SDP for large Transformer-based language models.
Experiments show that our models achieve strong performance while staying robust to the canary insertion attack.
arXiv Detail & Related papers (2022-04-15T22:36:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.