Unintended Memorization and Timing Attacks in Named Entity Recognition
Models
- URL: http://arxiv.org/abs/2211.02245v1
- Date: Fri, 4 Nov 2022 03:32:16 GMT
- Title: Unintended Memorization and Timing Attacks in Named Entity Recognition
Models
- Authors: Rana Salal Ali and Benjamin Zi Hao Zhao and Hassan Jameel Asghar and
Tham Nguyen and Ian David Wood and Dali Kaafar
- Abstract summary: We study the setting when NER models are available as a black-box service for identifying sensitive information in user documents.
With updated pre-trained NER models from spaCy, we demonstrate two distinct membership attacks on these models.
- Score: 5.404816271595691
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Named entity recognition models (NER), are widely used for identifying named
entities (e.g., individuals, locations, and other information) in text
documents. Machine learning based NER models are increasingly being applied in
privacy-sensitive applications that need automatic and scalable identification
of sensitive information to redact text for data sharing. In this paper, we
study the setting when NER models are available as a black-box service for
identifying sensitive information in user documents and show that these models
are vulnerable to membership inference on their training datasets. With updated
pre-trained NER models from spaCy, we demonstrate two distinct membership
attacks on these models. Our first attack capitalizes on unintended
memorization in the NER's underlying neural network, a phenomenon NNs are known
to be vulnerable to. Our second attack leverages a timing side-channel to
target NER models that maintain vocabularies constructed from the training
data. We show that different functional paths of words within the training
dataset in contrast to words not previously seen have measurable differences in
execution time. Revealing membership status of training samples has clear
privacy implications, e.g., in text redaction, sensitive words or phrases to be
found and removed, are at risk of being detected in the training dataset. Our
experimental evaluation includes the redaction of both password and health
data, presenting both security risks and privacy/regulatory issues. This is
exacerbated by results that show memorization with only a single phrase. We
achieved 70% AUC in our first attack on a text redaction use-case. We also show
overwhelming success in the timing attack with 99.23% AUC. Finally we discuss
potential mitigation approaches to realize the safe use of NER models in light
of the privacy and security implications of membership inference attacks.
Related papers
- Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models [112.48136829374741]
In this paper, we unveil a new vulnerability: the privacy backdoor attack.
When a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model.
Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.
arXiv Detail & Related papers (2024-04-01T16:50:54Z) - Planting and Mitigating Memorized Content in Predictive-Text Language
Models [11.911353678499008]
Language models are widely deployed to provide automatic text completion services in user products.
Recent research has revealed that language models bear considerable risk of memorizing private training data.
In this study, we test the efficacy of a range of privacy-preserving techniques to mitigate unintended memorization of sensitive user text.
arXiv Detail & Related papers (2022-12-16T17:57:14Z) - Truth Serum: Poisoning Machine Learning Models to Reveal Their Secrets [53.866927712193416]
We show that an adversary who can poison a training dataset can cause models trained on this dataset to leak private details belonging to other parties.
Our attacks are effective across membership inference, attribute inference, and data extraction.
Our results cast doubts on the relevance of cryptographic privacy guarantees in multiparty protocols for machine learning.
arXiv Detail & Related papers (2022-03-31T18:06:28Z) - Are Your Sensitive Attributes Private? Novel Model Inversion Attribute
Inference Attacks on Classification Models [22.569705869469814]
We focus on model inversion attacks where the adversary knows non-sensitive attributes about records in the training data.
We devise a novel confidence score-based model inversion attribute inference attack that significantly outperforms the state-of-the-art.
We also extend our attacks to the scenario where some of the other (non-sensitive) attributes of a target record are unknown to the adversary.
arXiv Detail & Related papers (2022-01-23T21:27:20Z) - Attribute Inference Attack of Speech Emotion Recognition in Federated
Learning Settings [56.93025161787725]
Federated learning (FL) is a distributed machine learning paradigm that coordinates clients to train a model collaboratively without sharing local data.
We propose an attribute inference attack framework that infers sensitive attribute information of the clients from shared gradients or model parameters.
We show that the attribute inference attack is achievable for SER systems trained using FL.
arXiv Detail & Related papers (2021-12-26T16:50:42Z) - Enhanced Membership Inference Attacks against Machine Learning Models [9.26208227402571]
Membership inference attacks are used to quantify the private information that a model leaks about the individual data points in its training set.
We derive new attack algorithms that can achieve a high AUC score while also highlighting the different factors that affect their performance.
Our algorithms capture a very precise approximation of privacy loss in models, and can be used as a tool to perform an accurate and informed estimation of privacy risk in machine learning models.
arXiv Detail & Related papers (2021-11-18T13:31:22Z) - Distantly-Supervised Named Entity Recognition with Noise-Robust Learning
and Language Model Augmented Self-Training [66.80558875393565]
We study the problem of training named entity recognition (NER) models using only distantly-labeled data.
We propose a noise-robust learning scheme comprised of a new loss function and a noisy label removal step.
Our method achieves superior performance, outperforming existing distantly-supervised NER models by significant margins.
arXiv Detail & Related papers (2021-09-10T17:19:56Z) - Membership Inference on Word Embedding and Beyond [17.202696286248294]
We show that word embeddings are vulnerable to black-box membership inference attacks under realistic assumptions.
We also show that this leakage persists through two other major NLP applications: classification and text-generation.
Our attack is a cheaper membership inference attack on text-generative models.
arXiv Detail & Related papers (2021-06-21T19:37:06Z) - Black-box Model Inversion Attribute Inference Attacks on Classification
Models [32.757792981935815]
We focus on one kind of model inversion attacks, where the adversary knows non-sensitive attributes about instances in the training data.
We devise two novel model inversion attribute inference attacks -- confidence modeling-based attack and confidence score-based attack.
We evaluate our attacks on two types of machine learning models, decision tree and deep neural network, trained with two real datasets.
arXiv Detail & Related papers (2020-12-07T01:14:19Z) - Sampling Attacks: Amplification of Membership Inference Attacks by
Repeated Queries [74.59376038272661]
We introduce sampling attack, a novel membership inference technique that unlike other standard membership adversaries is able to work under severe restriction of no access to scores of the victim model.
We show that a victim model that only publishes the labels is still susceptible to sampling attacks and the adversary can recover up to 100% of its performance.
For defense, we choose differential privacy in the form of gradient perturbation during the training of the victim model as well as output perturbation at prediction time.
arXiv Detail & Related papers (2020-09-01T12:54:54Z) - Adversarial Self-Supervised Contrastive Learning [62.17538130778111]
Existing adversarial learning approaches mostly use class labels to generate adversarial samples that lead to incorrect predictions.
We propose a novel adversarial attack for unlabeled data, which makes the model confuse the instance-level identities of the perturbed data samples.
We present a self-supervised contrastive learning framework to adversarially train a robust neural network without labeled data.
arXiv Detail & Related papers (2020-06-13T08:24:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.