Information Leakage in Embedding Models
- URL: http://arxiv.org/abs/2004.00053v2
- Date: Wed, 19 Aug 2020 19:58:14 GMT
- Title: Information Leakage in Embedding Models
- Authors: Congzheng Song and Ananth Raghunathan
- Abstract summary: We demonstrate that embeddings, in addition to encoding generic semantics, often also present a vector that leaks sensitive information about the input data.
We develop three classes of attacks to systematically study information that might be leaked by embeddings.
- Score: 19.497371893593918
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Embeddings are functions that map raw input data to low-dimensional vector
representations, while preserving important semantic information about the
inputs. Pre-training embeddings on a large amount of unlabeled data and
fine-tuning them for downstream tasks is now a de facto standard in achieving
state of the art learning in many domains.
We demonstrate that embeddings, in addition to encoding generic semantics,
often also present a vector that leaks sensitive information about the input
data. We develop three classes of attacks to systematically study information
that might be leaked by embeddings. First, embedding vectors can be inverted to
partially recover some of the input data. As an example, we show that our
attacks on popular sentence embeddings recover between 50\%--70\% of the input
words (F1 scores of 0.5--0.7). Second, embeddings may reveal sensitive
attributes inherent in inputs and independent of the underlying semantic task
at hand. Attributes such as authorship of text can be easily extracted by
training an inference model on just a handful of labeled embedding vectors.
Third, embedding models leak moderate amount of membership information for
infrequent training data inputs. We extensively evaluate our attacks on various
state-of-the-art embedding models in the text domain. We also propose and
evaluate defenses that can prevent the leakage to some extent at a minor cost
in utility.
Related papers
- Information Leakage of Sentence Embeddings via Generative Embedding Inversion Attacks [1.6427658855248815]
In this study, we reproduce GEIA's findings across various neural sentence embedding models.
We propose a simple yet effective method without any modification to the attacker's architecture proposed in GEIA.
Our findings indicate that following our approach, an adversary party can recover meaningful sensitive information related to the pre-training knowledge of the popular models used for creating sentence embeddings.
arXiv Detail & Related papers (2025-04-23T10:50:23Z) - On Adversarial Examples for Text Classification by Perturbing Latent Representations [0.0]
We show that deep learning is vulnerable to adversarial examples in text classification.
This weakness indicates that deep learning is not very robust.
We create a framework that measures the robustness of a text classifier by using the gradients of the classifier.
arXiv Detail & Related papers (2024-05-06T18:45:18Z) - Indiscriminate Data Poisoning Attacks on Pre-trained Feature Extractors [26.36344184385407]
In this paper, we explore the threat of indiscriminate attacks on downstream tasks that apply pre-trained feature extractors.
We propose two types of attacks: (1) the input space attacks, where we modify existing attacks to craft poisoned data in the input space; and (2) the feature targeted attacks, where we find poisoned features by treating the learned feature representations as a dataset.
Our experiments examine such attacks in popular downstream tasks of fine-tuning on the same dataset and transfer learning that considers domain adaptation.
arXiv Detail & Related papers (2024-02-20T01:12:59Z) - XAL: EXplainable Active Learning Makes Classifiers Better Low-resource Learners [71.8257151788923]
We propose a novel Explainable Active Learning framework (XAL) for low-resource text classification.
XAL encourages classifiers to justify their inferences and delve into unlabeled data for which they cannot provide reasonable explanations.
Experiments on six datasets show that XAL achieves consistent improvement over 9 strong baselines.
arXiv Detail & Related papers (2023-10-09T08:07:04Z) - 3D Adversarial Augmentations for Robust Out-of-Domain Predictions [115.74319739738571]
We focus on improving the generalization to out-of-domain data.
We learn a set of vectors that deform the objects in an adversarial fashion.
We perform adversarial augmentation by applying the learned sample-independent vectors to the available objects when training a model.
arXiv Detail & Related papers (2023-08-29T17:58:55Z) - Learning to Unlearn: Instance-wise Unlearning for Pre-trained
Classifiers [71.70205894168039]
We consider instance-wise unlearning, of which the goal is to delete information on a set of instances from a pre-trained model.
We propose two methods that reduce forgetting on the remaining data: 1) utilizing adversarial examples to overcome forgetting at the representation-level and 2) leveraging weight importance metrics to pinpoint network parameters guilty of propagating unwanted information.
arXiv Detail & Related papers (2023-01-27T07:53:50Z) - ALSO: Automotive Lidar Self-supervision by Occupancy estimation [70.70557577874155]
We propose a new self-supervised method for pre-training the backbone of deep perception models operating on point clouds.
The core idea is to train the model on a pretext task which is the reconstruction of the surface on which the 3D points are sampled.
The intuition is that if the network is able to reconstruct the scene surface, given only sparse input points, then it probably also captures some fragments of semantic information.
arXiv Detail & Related papers (2022-12-12T13:10:19Z) - Gradient Inversion Attack: Leaking Private Labels in Two-Party Split
Learning [12.335698325757491]
We propose a label leakage attack that allows an adversarial input owner to learn the label owner's private labels.
Our attack can uncover the private label data on several multi-class image classification problems and a binary conversion prediction task with near-perfect accuracy.
While this technique is effective for simpler datasets, it significantly degrades utility for datasets with higher input dimensionality.
arXiv Detail & Related papers (2021-11-25T16:09:59Z) - Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability
of the Embedding Layers in NLP Models [27.100909068228813]
Recent studies have revealed a security threat to natural language processing (NLP) models, called the Backdoor Attack.
In this paper, we find that it is possible to hack the model in a data-free way by modifying one single word embedding vector.
Experimental results on sentiment analysis and sentence-pair classification tasks show that our method is more efficient and stealthier.
arXiv Detail & Related papers (2021-03-29T12:19:45Z) - DeCLUTR: Deep Contrastive Learning for Unsupervised Textual
Representations [4.36561468436181]
We present DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations.
Our approach closes the performance gap between unsupervised and supervised pretraining for universal sentence encoders.
Our code and pretrained models are publicly available and can be easily adapted to new domains or used to embed unseen text.
arXiv Detail & Related papers (2020-06-05T20:00:28Z) - Null It Out: Guarding Protected Attributes by Iterative Nullspace
Projection [51.041763676948705]
Iterative Null-space Projection (INLP) is a novel method for removing information from neural representations.
We show that our method is able to mitigate bias in word embeddings, as well as to increase fairness in a setting of multi-class classification.
arXiv Detail & Related papers (2020-04-16T14:02:50Z) - Laplacian Denoising Autoencoder [114.21219514831343]
We propose to learn data representations with a novel type of denoising autoencoder.
The noisy input data is generated by corrupting latent clean data in the gradient domain.
Experiments on several visual benchmarks demonstrate that better representations can be learned with the proposed approach.
arXiv Detail & Related papers (2020-03-30T16:52:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.