LAMP: Extracting Text from Gradients with Language Model Priors
- URL: http://arxiv.org/abs/2202.08827v1
- Date: Thu, 17 Feb 2022 18:49:25 GMT
- Title: LAMP: Extracting Text from Gradients with Language Model Priors
- Authors: Dimitar I. Dimitrov, Mislav Balunovi\'c, Nikola Jovanovi\'c, Martin
Vechev
- Abstract summary: Recent work shows that sensitive user data can be reconstructed from gradient updates, breaking the key privacy promise of federated learning.
We propose LAMP, a novel attack tailored to textual data, that successfully reconstructs original text from gradients.
- Score: 9.242965489146398
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent work shows that sensitive user data can be reconstructed from gradient
updates, breaking the key privacy promise of federated learning. While success
was demonstrated primarily on image data, these methods do not directly
transfer to other domains such as text. In this work, we propose LAMP, a novel
attack tailored to textual data, that successfully reconstructs original text
from gradients. Our key insight is to model the prior probability of the text
with an auxiliary language model, utilizing it to guide the search towards more
natural text. Concretely, LAMP introduces a discrete text transformation
procedure that minimizes both the reconstruction loss and the prior text
probability, as provided by the auxiliary language model. The procedure is
alternated with a continuous optimization of the reconstruction loss, which
also regularizes the length of the reconstructed embeddings. Our experiments
demonstrate that LAMP reconstructs the original text significantly more
precisely than prior work: we recover 5x more bigrams and $23\%$ longer
subsequences on average. Moreover, we are first to recover inputs from batch
sizes larger than 1 for textual models. These findings indicate that gradient
updates of models operating on textual data leak more information than
previously thought.
Related papers
- Harnessing the Power of MLLMs for Transferable Text-to-Image Person ReID [44.372336186832584]
We study the transferable text-to-image ReID problem, where we train a model on our proposed large-scale database.
We obtain substantial training data via Multi-modal Large Language Models (MLLMs)
We introduce a novel method that automatically identifies words in a description that do not correspond with the image.
arXiv Detail & Related papers (2024-05-08T10:15:04Z) - Enhancing Scene Text Detectors with Realistic Text Image Synthesis Using
Diffusion Models [63.99110667987318]
We present DiffText, a pipeline that seamlessly blends foreground text with the background's intrinsic features.
With fewer text instances, our produced text images consistently surpass other synthetic data in aiding text detectors.
arXiv Detail & Related papers (2023-11-28T06:51:28Z) - RegaVAE: A Retrieval-Augmented Gaussian Mixture Variational Auto-Encoder
for Language Modeling [79.56442336234221]
We introduce RegaVAE, a retrieval-augmented language model built upon the variational auto-encoder (VAE)
It encodes the text corpus into a latent space, capturing current and future information from both source and target text.
Experimental results on various datasets demonstrate significant improvements in text generation quality and hallucination removal.
arXiv Detail & Related papers (2023-10-16T16:42:01Z) - Copy Is All You Need [66.00852205068327]
We formulate text generation as progressively copying text segments from an existing text collection.
Our approach achieves better generation quality according to both automatic and human evaluations.
Our approach attains additional performance gains by simply scaling up to larger text collections.
arXiv Detail & Related papers (2023-07-13T05:03:26Z) - eDiffi: Text-to-Image Diffusion Models with an Ensemble of Expert
Denoisers [87.52504764677226]
Large-scale diffusion-based generative models have led to breakthroughs in text-conditioned high-resolution image synthesis.
We train an ensemble of text-to-image diffusion models specialized for different stages synthesis.
Our ensemble of diffusion models, called eDiffi, results in improved text alignment while maintaining the same inference cost.
arXiv Detail & Related papers (2022-11-02T17:43:04Z) - Generative Negative Text Replay for Continual Vision-Language
Pretraining [95.2784858069843]
Vision-language pre-training has attracted increasing attention recently.
Massive data are usually collected in a streaming fashion.
We propose a multi-modal knowledge distillation between images and texts to align the instance-wise prediction between old and new models.
arXiv Detail & Related papers (2022-10-31T13:42:21Z) - Text Revealer: Private Text Reconstruction via Model Inversion Attacks
against Transformers [22.491785618530397]
We formulate emphText Revealer -- the first model inversion attack for text reconstruction against text classification with transformers.
Our attacks faithfully reconstruct private texts included in training data with access to the target model.
Our experiments demonstrate that our attacks are effective for datasets with different text lengths and can reconstruct private texts with accuracy.
arXiv Detail & Related papers (2022-09-21T17:05:12Z) - Recovering Private Text in Federated Learning of Language Models [30.646865969760412]
Federated learning allows distributed users to collaboratively train a model while keeping each user's data private.
We present a novel attack method FILM for federated learning of language models.
We show the feasibility of recovering text from large batch sizes of up to 128 sentences.
arXiv Detail & Related papers (2022-05-17T17:38:37Z) - Data-to-Text Generation with Iterative Text Editing [3.42658286826597]
We present a novel approach to data-to-text generation based on iterative text editing.
We first transform data items to text using trivial templates, and then we iteratively improve the resulting text by a neural model trained for the sentence fusion task.
The output of the model is filtered by a simple and reranked with an off-the-shelf pre-trained language model.
arXiv Detail & Related papers (2020-11-03T13:32:38Z) - Improving Text Generation with Student-Forcing Optimal Transport [122.11881937642401]
We propose using optimal transport (OT) to match the sequences generated in training and testing modes.
An extension is also proposed to improve the OT learning, based on the structural and contextual information of the text sequences.
The effectiveness of the proposed method is validated on machine translation, text summarization, and text generation tasks.
arXiv Detail & Related papers (2020-10-12T19:42:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.