Membership Inference on Word Embedding and Beyond
- URL: http://arxiv.org/abs/2106.11384v1
- Date: Mon, 21 Jun 2021 19:37:06 GMT
- Title: Membership Inference on Word Embedding and Beyond
- Authors: Saeed Mahloujifar, Huseyin A. Inan, Melissa Chase, Esha Ghosh,
Marcello Hasegawa
- Abstract summary: We show that word embeddings are vulnerable to black-box membership inference attacks under realistic assumptions.
We also show that this leakage persists through two other major NLP applications: classification and text-generation.
Our attack is a cheaper membership inference attack on text-generative models.
- Score: 17.202696286248294
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the text processing context, most ML models are built on word embeddings.
These embeddings are themselves trained on some datasets, potentially
containing sensitive data. In some cases this training is done independently,
in other cases, it occurs as part of training a larger, task-specific model. In
either case, it is of interest to consider membership inference attacks based
on the embedding layer as a way of understanding sensitive information leakage.
But, somewhat surprisingly, membership inference attacks on word embeddings and
their effect in other natural language processing (NLP) tasks that use these
embeddings, have remained relatively unexplored.
In this work, we show that word embeddings are vulnerable to black-box
membership inference attacks under realistic assumptions. Furthermore, we show
that this leakage persists through two other major NLP applications:
classification and text-generation, even when the embedding layer is not
exposed to the attacker. We show that our MI attack achieves high attack
accuracy against a classifier model and an LSTM-based language model. Indeed,
our attack is a cheaper membership inference attack on text-generative models,
which does not require the knowledge of the target model or any expensive
training of text-generative models as shadow models.
Related papers
- SA-Attack: Improving Adversarial Transferability of Vision-Language
Pre-training Models via Self-Augmentation [56.622250514119294]
In contrast to white-box adversarial attacks, transfer attacks are more reflective of real-world scenarios.
We propose a self-augment-based transfer attack method, termed SA-Attack.
arXiv Detail & Related papers (2023-12-08T09:08:50Z) - SCAT: Robust Self-supervised Contrastive Learning via Adversarial
Training for Text Classification [15.932462099791307]
We propose a novel learning framework called SCAT (Self-supervised Contrastive Learning via Adversarial Training)
SCAT modifies random augmentations of the data in a fully labelfree manner to generate adversarial examples.
Our results show that SCAT can not only train robust language models from scratch, but it can also significantly improve the robustness of existing pre-trained language models.
arXiv Detail & Related papers (2023-07-04T05:41:31Z) - Unintended Memorization and Timing Attacks in Named Entity Recognition
Models [5.404816271595691]
We study the setting when NER models are available as a black-box service for identifying sensitive information in user documents.
With updated pre-trained NER models from spaCy, we demonstrate two distinct membership attacks on these models.
arXiv Detail & Related papers (2022-11-04T03:32:16Z) - Are Your Sensitive Attributes Private? Novel Model Inversion Attribute
Inference Attacks on Classification Models [22.569705869469814]
We focus on model inversion attacks where the adversary knows non-sensitive attributes about records in the training data.
We devise a novel confidence score-based model inversion attribute inference attack that significantly outperforms the state-of-the-art.
We also extend our attacks to the scenario where some of the other (non-sensitive) attributes of a target record are unknown to the adversary.
arXiv Detail & Related papers (2022-01-23T21:27:20Z) - Revisiting Self-Training for Few-Shot Learning of Language Model [61.173976954360334]
Unlabeled data carry rich task-relevant information, they are proven useful for few-shot learning of language model.
In this work, we revisit the self-training technique for language model fine-tuning and present a state-of-the-art prompt-based few-shot learner, SFLM.
arXiv Detail & Related papers (2021-10-04T08:51:36Z) - BERT-Defense: A Probabilistic Model Based on BERT to Combat Cognitively
Inspired Orthographic Adversarial Attacks [10.290050493635343]
Adversarial attacks expose important blind spots of deep learning systems.
Character-level attacks typically insert typos into the input stream.
We show that an untrained iterative approach can perform on par with human crowd-workers supervised via 3-shot learning.
arXiv Detail & Related papers (2021-06-02T20:21:03Z) - Hidden Backdoor Attack against Semantic Segmentation Models [60.0327238844584]
The emphbackdoor attack intends to embed hidden backdoors in deep neural networks (DNNs) by poisoning training data.
We propose a novel attack paradigm, the emphfine-grained attack, where we treat the target label from the object-level instead of the image-level.
Experiments show that the proposed methods can successfully attack semantic segmentation models by poisoning only a small proportion of training data.
arXiv Detail & Related papers (2021-03-06T05:50:29Z) - Learning to Attack: Towards Textual Adversarial Attacking in Real-world
Situations [81.82518920087175]
Adversarial attacking aims to fool deep neural networks with adversarial examples.
We propose a reinforcement learning based attack model, which can learn from attack history and launch attacks more efficiently.
arXiv Detail & Related papers (2020-09-19T09:12:24Z) - Privacy Analysis of Deep Learning in the Wild: Membership Inference
Attacks against Transfer Learning [27.494206948563885]
We present the first systematic evaluation of membership inference attacks against transfer learning models.
Experiments on four real-world image datasets show that membership inference can achieve effective performance.
Our results shed light on the severity of membership risks stemming from machine learning models in practice.
arXiv Detail & Related papers (2020-09-10T14:14:22Z) - Sampling Attacks: Amplification of Membership Inference Attacks by
Repeated Queries [74.59376038272661]
We introduce sampling attack, a novel membership inference technique that unlike other standard membership adversaries is able to work under severe restriction of no access to scores of the victim model.
We show that a victim model that only publishes the labels is still susceptible to sampling attacks and the adversary can recover up to 100% of its performance.
For defense, we choose differential privacy in the form of gradient perturbation during the training of the victim model as well as output perturbation at prediction time.
arXiv Detail & Related papers (2020-09-01T12:54:54Z) - Two Sides of the Same Coin: White-box and Black-box Attacks for Transfer
Learning [60.784641458579124]
We show that fine-tuning effectively enhances model robustness under white-box FGSM attacks.
We also propose a black-box attack method for transfer learning models which attacks the target model with the adversarial examples produced by its source model.
To systematically measure the effect of both white-box and black-box attacks, we propose a new metric to evaluate how transferable are the adversarial examples produced by a source model to a target model.
arXiv Detail & Related papers (2020-08-25T15:04:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.