DAGER: Exact Gradient Inversion for Large Language Models
- URL: http://arxiv.org/abs/2405.15586v1
- Date: Fri, 24 May 2024 14:14:24 GMT
- Title: DAGER: Exact Gradient Inversion for Large Language Models
- Authors: Ivo Petrov, Dimitar I. Dimitrov, Maximilian Baader, Mark Niklas Müller, Martin Vechev,
- Abstract summary: Federated learning works by aggregating locally computed gradients from multiple clients.
Prior work has shown that the data can actually be recovered by the server using so-called gradient inversion attacks.
We propose DAGER, the first algorithm to recover whole batches of input text exactly.
- Score: 10.998375857698496
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Federated learning works by aggregating locally computed gradients from multiple clients, thus enabling collaborative training without sharing private client data. However, prior work has shown that the data can actually be recovered by the server using so-called gradient inversion attacks. While these attacks perform well when applied on images, they are limited in the text domain and only permit approximate reconstruction of small batches and short input sequences. In this work, we propose DAGER, the first algorithm to recover whole batches of input text exactly. DAGER leverages the low-rank structure of self-attention layer gradients and the discrete nature of token embeddings to efficiently check if a given token sequence is part of the client data. We use this check to exactly recover full batches in the honest-but-curious setting without any prior on the data for both encoder- and decoder-based architectures using exhaustive heuristic search and a greedy approach, respectively. We provide an efficient GPU implementation of DAGER and show experimentally that it recovers full batches of size up to 128 on large language models (LLMs), beating prior attacks in speed (20x at same batch size), scalability (10x larger batches), and reconstruction quality (ROUGE-1/2 > 0.99).
Related papers
- Language Models as Zero-shot Lossless Gradient Compressors: Towards
General Neural Parameter Prior Models [66.1595537904019]
Large language models (LLMs) can act as gradient priors in a zero-shot setting.
We introduce LM-GC, a novel method that integrates LLMs with arithmetic coding.
arXiv Detail & Related papers (2024-09-26T13:38:33Z) - Privacy-Preserving Logistic Regression Training on Large Datasets [0.0]
We propose an efficient algorithm for logistic regression training on large encrypted data using Homomorphic Encryption (HE)
We also implement the full-batch version of their method when the dataset is so large that it has to be encrypted in the mini-batch manner.
arXiv Detail & Related papers (2024-06-19T05:19:20Z) - SPEAR:Exact Gradient Inversion of Batches in Federated Learning [11.799563040751591]
Federated learning is a framework for machine learning where clients only share gradient updates and not their private data with a server.
We propose SPEAR, the first algorithm reconstructing whole batches with $b >1$ exactly.
We show that it recovers high-dimensional ImageNet inputs in batches of up to $b lesssim 25$ exactly while scaling to large networks.
arXiv Detail & Related papers (2024-03-06T18:52:39Z) - Maximum Knowledge Orthogonality Reconstruction with Gradients in
Federated Learning [12.709670487307294]
Federated learning (FL) aims at keeping client data local to preserve privacy.
Most existing FL approaches assume an FL setting with unrealistically small batch size.
We propose a novel and completely analytical approach to reconstruct clients' input data.
arXiv Detail & Related papers (2023-10-30T02:01:48Z) - LOKI: Large-scale Data Reconstruction Attack against Federated Learning
through Model Manipulation [25.03733882637947]
We introduce LOKI, an attack that overcomes previous limitations and also breaks the anonymity of aggregation.
With FedAVG and aggregation across 100 clients, prior work can leak less than 1% of images on MNIST, CIFAR-100, and Tiny ImageNet.
Using only a single training round, LOKI is able to leak 76-86% of all data samples.
arXiv Detail & Related papers (2023-03-21T23:29:35Z) - Decoder Tuning: Efficient Language Understanding as Decoding [84.68266271483022]
We present Decoder Tuning (DecT), which in contrast optimize task-specific decoder networks on the output side.
By gradient-based optimization, DecT can be trained within several seconds and requires only one P query per sample.
We conduct extensive natural language understanding experiments and show that DecT significantly outperforms state-of-the-art algorithms with a $200times$ speed-up.
arXiv Detail & Related papers (2022-12-16T11:15:39Z) - CloudAttention: Efficient Multi-Scale Attention Scheme For 3D Point
Cloud Learning [81.85951026033787]
We set transformers in this work and incorporate them into a hierarchical framework for shape classification and part and scene segmentation.
We also compute efficient and dynamic global cross attentions by leveraging sampling and grouping at each iteration.
The proposed hierarchical model achieves state-of-the-art shape classification in mean accuracy and yields results on par with the previous segmentation methods.
arXiv Detail & Related papers (2022-07-31T21:39:15Z) - Autoregressive Search Engines: Generating Substrings as Document
Identifiers [53.0729058170278]
Autoregressive language models are emerging as the de-facto standard for generating answers.
Previous work has explored ways to partition the search space into hierarchical structures.
In this work we propose an alternative that doesn't force any structure in the search space: using all ngrams in a passage as its possible identifiers.
arXiv Detail & Related papers (2022-04-22T10:45:01Z) - Rethinking Reconstruction Autoencoder-Based Out-of-Distribution
Detection [0.0]
Reconstruction autoencoder-based methods deal with the problem by using input reconstruction error as a metric of novelty vs. normality.
We introduce semantic reconstruction, data certainty decomposition and normalized L2 distance to substantially improve original methods.
Our method works without any additional data, hard-to-implement structure, time-consuming pipeline, and even harming the classification accuracy of known classes.
arXiv Detail & Related papers (2022-03-04T09:04:55Z) - See through Gradients: Image Batch Recovery via GradInversion [103.26922860665039]
We introduce GradInversion, using which input images from a larger batch can also be recovered for large networks such as ResNets (50 layers)
We show that gradients encode a surprisingly large amount of information, such that all the individual images can be recovered with high fidelity via GradInversion, even for complex datasets, deep networks, and large batch sizes.
arXiv Detail & Related papers (2021-04-15T16:43:17Z) - Evaluating Prediction-Time Batch Normalization for Robustness under
Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift.
We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness.
The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.