Related papers: ScaleOT: Privacy-utility-scalable Offsite-tuning with Dynamic LayerReplace and Selective Rank Compression

ScaleOT: Privacy-utility-scalable Offsite-tuning with Dynamic LayerReplace and Selective Rank Compression

URL: http://arxiv.org/abs/2412.09812v1
Date: Fri, 13 Dec 2024 03:00:48 GMT
Title: ScaleOT: Privacy-utility-scalable Offsite-tuning with Dynamic LayerReplace and Selective Rank Compression
Authors: Kai Yao, Zhaorui Tan, Tiandi Ye, Lichun Li, Yuan Zhao, Wenyan Liu, Wei Wang, Jianke Zhu,
Abstract summary: Offsite-tuning is a privacy-preserving method for tuning large language models.<n>We propose ScaleOT, a novel privacy-utility-scalable offsite-tuning framework.
Score: 13.702472186412296
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Offsite-tuning is a privacy-preserving method for tuning large language models (LLMs) by sharing a lossy compressed emulator from the LLM owners with data owners for downstream task tuning. This approach protects the privacy of both the model and data owners. However, current offsite tuning methods often suffer from adaptation degradation, high computational costs, and limited protection strength due to uniformly dropping LLM layers or relying on expensive knowledge distillation. To address these issues, we propose ScaleOT, a novel privacy-utility-scalable offsite-tuning framework that effectively balances privacy and utility. ScaleOT introduces a novel layerwise lossy compression algorithm that uses reinforcement learning to obtain the importance of each layer. It employs lightweight networks, termed harmonizers, to replace the raw LLM layers. By combining important original LLM layers and harmonizers in different ratios, ScaleOT generates emulators tailored for optimal performance with various model scales for enhanced privacy protection. Additionally, we present a rank reduction method to further compress the original LLM layers, significantly enhancing privacy with negligible impact on utility. Comprehensive experiments show that ScaleOT can achieve nearly lossless offsite tuning performance compared with full fine-tuning while obtaining better model privacy.

Related papers

GradOT: Training-free Gradient-preserving Offsite-tuning for Large Language Models [15.489070604001466]
This paper introduces a novel OT approach based on gradient-preserving compression, named GradOT.<n>By analyzing the OT problem through the lens of optimization, we propose a method that selectively applies compression techniques such as rank compression and channel pruning, preserving the gradients of fine-tuned adapters while ensuring privacy.
arXiv Detail & Related papers (2025-07-06T16:27:27Z)
SOFT: Selective Data Obfuscation for Protecting LLM Fine-tuning against Membership Inference Attacks [17.77094760401298]
We study the vulnerability of fine-tuned large language models to membership inference attacks (MIAs)<n>We propose SOFT, a novel defense technique that mitigates privacy leakage by leveraging influential data selection with an adjustable parameter to balance utility preservation and privacy protection.
arXiv Detail & Related papers (2025-06-12T07:23:56Z)
FedShield-LLM: A Secure and Scalable Federated Fine-Tuned Large Language Model [0.48342038441006796]
Federated Learning (FL) offers a decentralized framework for training and fine-tuning Large Language Models (LLMs)<n>FL addresses privacy and security concerns while navigating challenges associated with the substantial computational demands of LLMs.<n>We propose a novel method, FedShield-LLM, that uses pruning with Fully Homomorphic Encryption (FHE) for Low-Rank Adaptation (LoRA) parameters.
arXiv Detail & Related papers (2025-06-06T00:05:05Z)
A Federated Splitting Framework for LLMs: Security, Efficiency, and Adaptability [15.194518946737801]
We introduce FL-LLaMA, a secure, efficient, and adaptive federated split framework based on LLaMA2.<n>We employ client-batch and server-hierarchical strategies to achieve parallel training, along with attention-mask compression and KV cache mechanisms to accelerate inference.<n>Experiments on NLU, summarization and conversational QA tasks show that FL-LLaMA maintains performance comparable to centralized LLaMA2, and achieves up to 2x train speedups and 8x inference speedups.
arXiv Detail & Related papers (2025-05-21T15:58:08Z)
Prada: Black-Box LLM Adaptation with Private Data on Resource-Constrained Devices [16.500721672193762]
Large Language Models (LLMs) can be adapted to specialized domains using private datasets stored on resource-constrained edge devices. We propose Prada, a privacy-preserving and efficient black-box LLM adaptation system using private on-device datasets. Prada achieves performance comparable to centralized fine-tuning methods while significantly reducing computational overhead by up to 60% and communication costs by up to 80%.
arXiv Detail & Related papers (2025-03-19T06:38:51Z)
Efficient Federated Fine-Tuning of Large Language Models with Layer Dropout [15.009864792277236]
Fine-tuning plays a crucial role in enabling pre-trained LLMs to evolve from general language comprehension to task-specific expertise. This work proposes DropPEFT, an innovative federated PEFT framework that employs a novel transformer dropout method. We show that DropPEFT can achieve a 1.3-6.3times speedup in model convergence and a 40%-67% reduction in memory footprint.
arXiv Detail & Related papers (2025-03-13T09:59:16Z)
LLM-Lasso: A Robust Framework for Domain-Informed Feature Selection and Regularization [59.75242204923353]
We introduce LLM-Lasso, a framework that leverages large language models (LLMs) to guide feature selection in Lasso regression. LLMs generate penalty factors for each feature, which are converted into weights for the Lasso penalty using a simple, tunable model. Features identified as more relevant by the LLM receive lower penalties, increasing their likelihood of being retained in the final model.
arXiv Detail & Related papers (2025-02-15T02:55:22Z)
Sparse Gradient Compression for Fine-Tuning Large Language Models [58.44973963468691]
Fine-tuning large language models (LLMs) for downstream tasks has become increasingly crucial due to their widespread use and the growing availability of open-source models. High memory costs associated with fine-tuning remain a significant challenge, especially as models increase in size. We propose sparse compression gradient (SGC) to address these limitations.
arXiv Detail & Related papers (2025-02-01T04:18:28Z)
TinyML NLP Approach for Semantic Wireless Sentiment Classification [49.801175302937246]
We introduce split learning (SL) as an energy-efficient alternative, privacy-preserving tiny machine learning (MLTiny) scheme. Our results show that SL reduces processing power and CO2 emissions while maintaining high accuracy, whereas FL offers a balanced compromise between efficiency and privacy.
arXiv Detail & Related papers (2024-11-09T21:26:59Z)
LoRC: Low-Rank Compression for LLMs KV Cache with a Progressive Compression Strategy [59.1298692559785]
Key-Value ( KV) cache is crucial component in serving transformer-based autoregressive large language models (LLMs) Existing approaches to mitigate this issue include: (1) efficient attention variants integrated in upcycling stages; (2) KV cache compression at test time; and (3) KV cache compression at test time. We propose a low-rank approximation of KV weight matrices, allowing plug-in integration with existing transformer-based LLMs without model retraining. Our method is designed to function without model tuning in upcycling stages or task-specific profiling in test stages.
arXiv Detail & Related papers (2024-10-04T03:10:53Z)
EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Layerwise Unified Compression and Adaptive Layer Tuning and Voting [12.006890185810322]
We introduce a computation- and memory-efficient LLM tuning framework, called Edge-LLM, to facilitate affordable and effective LLM adaptation on edge devices. Specifically, Edge-LLM features three core components: (1) a layer-wise unified compression (LUC) technique to reduce the computation overhead by generating layer-wise pruning sparsity and quantization bit-width policies, (2) an adaptive layer tuning and voting scheme to reduce the memory overhead by reducing the backpropagation depth, and (3) a complementary hardware scheduling strategy to handle the irregular computation patterns introduced by LUC and adaptive layer tuning.
arXiv Detail & Related papers (2024-06-22T06:51:47Z)
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method [56.571951345048355]
Large language models (LLMs) often adopt finetuning to unlock their capabilities for downstream applications. We study whether and how different scaling factors, including LLM model size, pretraining data size, new finetuning parameter size and finetuning data size, affect the finetuning performance.
arXiv Detail & Related papers (2024-02-27T04:18:49Z)
Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes [53.4856038354195]
Pre-trained large language models (LLMs) need fine-tuning to improve their responsiveness to natural language instructions. FedKSeed employs zeroth-order optimization with a finite set of random seeds. It significantly reduces transmission requirements between the server and clients to just a few random seeds.
arXiv Detail & Related papers (2023-12-11T13:03:21Z)
Split-and-Denoise: Protect large language model inference with local differential privacy [2.572566198588905]
Split-N-Denoise (SnD) is a private inference framework that splits the model to execute the token embedding layer on the client side at minimal computational cost. We show SnD's effectiveness in optimizing the privacy-utility tradeoff across various LLM architectures and diverse downstream tasks.
arXiv Detail & Related papers (2023-10-13T14:17:33Z)
Over-the-Air Federated Learning with Privacy Protection via Correlated Additive Perturbations [57.20885629270732]
We consider privacy aspects of wireless federated learning with Over-the-Air (OtA) transmission of gradient updates from multiple users/agents to an edge server. Traditional perturbation-based methods provide privacy protection while sacrificing the training accuracy. In this work, we aim at minimizing privacy leakage to the adversary and the degradation of model accuracy at the edge server.
arXiv Detail & Related papers (2022-10-05T13:13:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.