Low-Rank Adaptation Secretly Imitates Differentially Private SGD
- URL: http://arxiv.org/abs/2409.17538v6
- Date: Tue, 03 Jun 2025 16:03:24 GMT
- Title: Low-Rank Adaptation Secretly Imitates Differentially Private SGD
- Authors: Saber Malekmohammadi, Golnoosh Farnadi,
- Abstract summary: We show theoretically that low-rank adaptation is equivalent to fine-tuning adapters with noisy batch gradients.<n>We also quantify the variance of the injected noise as a decreasing function of adaptation rank.<n>Low-rank adaptation provides robustness to membership inference attacks w.r.t the fine-tuning data.
- Score: 5.359060261460183
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As pre-trained language models grow in size, full fine-tuning their parameters on task adaptation data becomes increasingly impractical. To address this challenge, some methods for low-rank adaptation of language models have been proposed, e.g. LoRA, which incorporates trainable low-rank decomposition matrices into only some parameters of the pre-trained model, called adapters. This approach significantly reduces the number of trainable parameters compared to fine-tuning all parameters or adapters. In this work, we look at low-rank adaptation method from the lens of data privacy. We show theoretically that the low-rank adaptation used in LoRA is equivalent to fine-tuning adapters with noisy batch gradients - just like what DPSGD algorithm does. We also quantify the variance of the injected noise as a decreasing function of adaptation rank. By establishing a Berry-Esseen type bound on the total variation distance between the injected noise distribution and a Gaussian noise distribution with the same variance, we show that the dynamics of low-rank adaptation is very close to when DPSGD is performed w.r.t the adapters. Following our theoretical findings and approved by our experimental results, we show that low-rank adaptation provides robustness to membership inference attacks w.r.t the fine-tuning data.
Related papers
- Improving Robustness of Foundation Models in Domain Adaptation with Soup-Adapters [0.0]
We show that by training multiple independent adapters and averaging their outputs, the new model has a higher performance and is more robust to distribution shifts compared to any individual adapter.<n>This is also the first study to explore CLIP adapter-style techniques for DINOv2 and to directly compare them with CLIP in this setting.
arXiv Detail & Related papers (2025-07-08T09:26:10Z) - Transformed Low-rank Adaptation via Tensor Decomposition and Its Applications to Text-to-image Models [32.68721299475496]
Low-Rank Adaptation (LoRA) and its variants have gained significant attention due to their effectiveness.<n>We propose a new PEFT method that combines two classes of adaptations, namely, transform and residual adaptations.<n>Experiments are conducted on fine-tuning Stable Diffusion models in subject-driven and controllable generation.
arXiv Detail & Related papers (2025-01-15T11:10:37Z) - FineGates: LLMs Finetuning with Compression using Stochastic Gates [7.093692674858257]
Large Language Models (LLMs) present significant challenges for full finetuning due to the high computational demands.
Lightweight finetuning techniques have been proposed, like learning low-rank adapter layers.
We propose an adaptor model based on gates that simultaneously sparsify the frozen base model with task-specific adaptation.
arXiv Detail & Related papers (2024-12-17T14:33:05Z) - OP-LoRA: The Blessing of Dimensionality [93.08208871549557]
Low-rank adapters enable fine-tuning of large models with only a small number of parameters.
They often pose optimization challenges, with poor convergence.
We introduce an over- parameterized approach that accelerates training without increasing inference costs.
We achieve improvements in vision-language tasks and especially notable increases in image generation.
arXiv Detail & Related papers (2024-12-13T18:55:19Z) - Accelerated zero-order SGD under high-order smoothness and overparameterized regime [79.85163929026146]
We present a novel gradient-free algorithm to solve convex optimization problems.
Such problems are encountered in medicine, physics, and machine learning.
We provide convergence guarantees for the proposed algorithm under both types of noise.
arXiv Detail & Related papers (2024-11-21T10:26:17Z) - DEeR: Deviation Eliminating and Noise Regulating for Privacy-preserving Federated Low-rank Adaptation [29.30782543513243]
We propose a privacy-preserving federated finetuning framework called underlineDeviation underlineEliminating and Noisunderlinee underlineRegulating (DEeR)
We show that DEeR shows better performance on public medical datasets in comparison with state-of-the-art approaches.
arXiv Detail & Related papers (2024-10-16T18:11:52Z) - LoRTA: Low Rank Tensor Adaptation of Large Language Models [70.32218116940393]
Low Rank Adaptation (LoRA) is a popular Efficient Fine Tuning (PEFT) method that effectively adapts large pre-trained models for downstream tasks.
We propose a novel approach that employs a low rank tensor parametrization for model updates.
Our method is both efficient and effective for fine-tuning large language models, achieving a substantial reduction in the number of parameters while maintaining comparable performance.
arXiv Detail & Related papers (2024-10-05T06:59:50Z) - SARA: Singular-Value Based Adaptive Low-Rank Adaption [4.135688713311511]
LoRA as a parameter-efficient fine-tuning(PEFT) method is widely used for not adding inference overhead.
In this work, we first analyze the relationship between the performance of different layers and their ranks using SVD.
Based on this, we design the Singular-Value Based Adaptive Low-Rank Adaption(SARA)
arXiv Detail & Related papers (2024-08-06T16:39:42Z) - Hadamard Adapter: An Extreme Parameter-Efficient Adapter Tuning Method for Pre-trained Language Models [108.08773541490191]
Pre-trained Language models (PLMs) have a huge amount of parameters, fine-tuning them is often expensive and time consuming.
It is necessary to adopt a parameter-efficient approach to reduce parameters of PLMs in fine-tuning without compromising their performance in downstream tasks.
In this paper, we design a novel adapter which only acts on self-attention outputs in PLMs.
arXiv Detail & Related papers (2024-07-04T18:21:28Z) - Minimizing Energy Costs in Deep Learning Model Training: The Gaussian Sampling Approach [11.878350833222711]
We propose a method called em GradSamp for sampling gradient updates from a Gaussian distribution.
em GradSamp not only streamlines gradient but also enables skipping entire epochs, thereby enhancing overall efficiency.
We rigorously validate our hypothesis across a diverse set of standard and non-standard CNN and transformer-based models.
arXiv Detail & Related papers (2024-06-11T15:01:20Z) - ConvLoRA and AdaBN based Domain Adaptation via Self-Training [4.006331916849688]
We propose Convolutional Low-Rank Adaptation (ConvLoRA) for multi-target domain adaptation.
ConvLoRA freezes pre-trained model weights, adds trainable low-rank decomposition matrices to convolutional layers, and backpropagates the gradient.
Our method has fewer trainable parameters and performs better or on-par with large independent fine-tuned networks.
arXiv Detail & Related papers (2024-02-07T15:43:50Z) - Curvature-Informed SGD via General Purpose Lie-Group Preconditioners [6.760212042305871]
We present a novel approach to accelerate gradient descent (SGD) by utilizing curvature information.
Our approach involves two preconditioners: a matrix-free preconditioner and a low-rank approximation preconditioner.
We demonstrate that Preconditioned SGD (PSGD) outperforms SoTA on Vision, NLP, and RL tasks.
arXiv Detail & Related papers (2024-02-07T03:18:00Z) - Sparse is Enough in Fine-tuning Pre-trained Large Language Models [98.46493578509039]
We propose a gradient-based sparse fine-tuning algorithm, named Sparse Increment Fine-Tuning (SIFT)
We validate its effectiveness on a range of tasks including the GLUE Benchmark and Instruction-tuning.
arXiv Detail & Related papers (2023-12-19T06:06:30Z) - Efficient Adaptation of Large Vision Transformer via Adapter
Re-Composing [8.88477151877883]
High-capacity pre-trained models have revolutionized problem-solving in computer vision.
We propose a novel Adapter Re-Composing (ARC) strategy that addresses efficient pre-trained model adaptation.
Our approach considers the reusability of adaptation parameters and introduces a parameter-sharing scheme.
arXiv Detail & Related papers (2023-10-10T01:04:15Z) - Dynamic Low-Rank Instance Adaptation for Universal Neural Image
Compression [33.92792778925365]
We propose a low-rank adaptation approach to address the rate-distortion drop observed in out-of-domain datasets.
Our proposed method exhibits universality across diverse image datasets.
arXiv Detail & Related papers (2023-08-15T12:17:46Z) - Adaptive Self-supervision Algorithms for Physics-informed Neural
Networks [59.822151945132525]
Physics-informed neural networks (PINNs) incorporate physical knowledge from the problem domain as a soft constraint on the loss function.
We study the impact of the location of the collocation points on the trainability of these models.
We propose a novel adaptive collocation scheme which progressively allocates more collocation points to areas where the model is making higher errors.
arXiv Detail & Related papers (2022-07-08T18:17:06Z) - AdapterBias: Parameter-efficient Token-dependent Representation Shift
for Adapters in NLP Tasks [55.705355299065474]
Transformer-based pre-trained models with millions of parameters require large storage.
Recent approaches tackle this shortcoming by training adapters, but these approaches still require a relatively large number of parameters.
In this study, AdapterBias, a surprisingly simple yet effective adapter architecture, is proposed.
arXiv Detail & Related papers (2022-04-30T16:49:41Z) - Adaptive Noisy Data Augmentation for Regularized Estimation and
Inference in Generalized Linear Models [15.817569026827451]
We propose the AdaPtive Noise Augmentation (PANDA) procedure to regularize the estimation and inference of generalized linear models (GLMs)
We demonstrate the superior or similar performance of PANDA against the existing approaches of the same type of regularizers in simulated and real-life data.
arXiv Detail & Related papers (2022-04-18T22:02:37Z) - Robust Optimal Transport with Applications in Generative Modeling and
Domain Adaptation [120.69747175899421]
Optimal Transport (OT) distances such as Wasserstein have been used in several areas such as GANs and domain adaptation.
We propose a computationally-efficient dual form of the robust OT optimization that is amenable to modern deep learning applications.
Our approach can train state-of-the-art GAN models on noisy datasets corrupted with outlier distributions.
arXiv Detail & Related papers (2020-10-12T17:13:40Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.