Related papers: Unintended Memorization and Timing Attacks in Named Entity Recognition Models

Unintended Memorization and Timing Attacks in Named Entity Recognition Models

URL: http://arxiv.org/abs/2211.02245v1
Date: Fri, 4 Nov 2022 03:32:16 GMT
Title: Unintended Memorization and Timing Attacks in Named Entity Recognition Models
Authors: Rana Salal Ali and Benjamin Zi Hao Zhao and Hassan Jameel Asghar and Tham Nguyen and Ian David Wood and Dali Kaafar
Abstract summary: We study the setting when NER models are available as a black-box service for identifying sensitive information in user documents. With updated pre-trained NER models from spaCy, we demonstrate two distinct membership attacks on these models.
Score: 5.404816271595691
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Named entity recognition models (NER), are widely used for identifying named entities (e.g., individuals, locations, and other information) in text documents. Machine learning based NER models are increasingly being applied in privacy-sensitive applications that need automatic and scalable identification of sensitive information to redact text for data sharing. In this paper, we study the setting when NER models are available as a black-box service for identifying sensitive information in user documents and show that these models are vulnerable to membership inference on their training datasets. With updated pre-trained NER models from spaCy, we demonstrate two distinct membership attacks on these models. Our first attack capitalizes on unintended memorization in the NER's underlying neural network, a phenomenon NNs are known to be vulnerable to. Our second attack leverages a timing side-channel to target NER models that maintain vocabularies constructed from the training data. We show that different functional paths of words within the training dataset in contrast to words not previously seen have measurable differences in execution time. Revealing membership status of training samples has clear privacy implications, e.g., in text redaction, sensitive words or phrases to be found and removed, are at risk of being detected in the training dataset. Our experimental evaluation includes the redaction of both password and health data, presenting both security risks and privacy/regulatory issues. This is exacerbated by results that show memorization with only a single phrase. We achieved 70% AUC in our first attack on a text redaction use-case. We also show overwhelming success in the timing attack with 99.23% AUC. Finally we discuss potential mitigation approaches to realize the safe use of NER models in light of the privacy and security implications of membership inference attacks.

Related papers

Preserving Privacy Without Compromising Accuracy: Machine Unlearning for Handwritten Text Recognition [12.228611784356412]
Handwritten Text Recognition (HTR) is essential for document analysis and digitization. Legislation like the right to be forgotten'' underscores the necessity for methods that can expunge sensitive information from trained models. We introduce a novel two-stage unlearning strategy for a multi-head transformer-based HTR model, integrating pruning and random labeling.
arXiv Detail & Related papers (2025-04-11T15:21:12Z)
Disparate Privacy Vulnerability: Targeted Attribute Inference Attacks and Defenses [1.740992908651449]
A potential threat arises from an adversary querying trained models using the public, non-sensitive attributes of entities in the training data. We develop a novel inference attack called the disparity inference attack, which targets the identification of high-risk groups within the dataset. We are also the first to introduce a novel and effective disparity mitigation technique that simultaneously preserves model performance and prevents any risk of targeted attacks.
arXiv Detail & Related papers (2025-04-05T02:58:37Z)
ExpShield: Safeguarding Web Text from Unauthorized Crawling and Language Modeling Exploitation [17.71790411163849]
We propose ExpShiled, a proactive self-defense mechanism that mitigates sample-specific memorization via imperceptible text perturbations.<n>Our approach requires no external collaboration while maintaining original readability.<n>Even with privacy backdoors, the Membership Inference Attack (MIA) AUC drops from 0.95 to 0.55, and instance exploitation approaches zero.
arXiv Detail & Related papers (2024-12-30T17:52:02Z)
Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models [112.48136829374741]
In this paper, we unveil a new vulnerability: the privacy backdoor attack. When a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model. Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.
arXiv Detail & Related papers (2024-04-01T16:50:54Z)
Planting and Mitigating Memorized Content in Predictive-Text Language Models [11.911353678499008]
Language models are widely deployed to provide automatic text completion services in user products. Recent research has revealed that language models bear considerable risk of memorizing private training data. In this study, we test the efficacy of a range of privacy-preserving techniques to mitigate unintended memorization of sensitive user text.
arXiv Detail & Related papers (2022-12-16T17:57:14Z)
Truth Serum: Poisoning Machine Learning Models to Reveal Their Secrets [53.866927712193416]
We show that an adversary who can poison a training dataset can cause models trained on this dataset to leak private details belonging to other parties. Our attacks are effective across membership inference, attribute inference, and data extraction. Our results cast doubts on the relevance of cryptographic privacy guarantees in multiparty protocols for machine learning.
arXiv Detail & Related papers (2022-03-31T18:06:28Z)
Are Your Sensitive Attributes Private? Novel Model Inversion Attribute Inference Attacks on Classification Models [22.569705869469814]
We focus on model inversion attacks where the adversary knows non-sensitive attributes about records in the training data. We devise a novel confidence score-based model inversion attribute inference attack that significantly outperforms the state-of-the-art. We also extend our attacks to the scenario where some of the other (non-sensitive) attributes of a target record are unknown to the adversary.
arXiv Detail & Related papers (2022-01-23T21:27:20Z)
Attribute Inference Attack of Speech Emotion Recognition in Federated Learning Settings [56.93025161787725]
Federated learning (FL) is a distributed machine learning paradigm that coordinates clients to train a model collaboratively without sharing local data. We propose an attribute inference attack framework that infers sensitive attribute information of the clients from shared gradients or model parameters. We show that the attribute inference attack is achievable for SER systems trained using FL.
arXiv Detail & Related papers (2021-12-26T16:50:42Z)
Enhanced Membership Inference Attacks against Machine Learning Models [9.26208227402571]
Membership inference attacks are used to quantify the private information that a model leaks about the individual data points in its training set. We derive new attack algorithms that can achieve a high AUC score while also highlighting the different factors that affect their performance. Our algorithms capture a very precise approximation of privacy loss in models, and can be used as a tool to perform an accurate and informed estimation of privacy risk in machine learning models.
arXiv Detail & Related papers (2021-11-18T13:31:22Z)
Distantly-Supervised Named Entity Recognition with Noise-Robust Learning and Language Model Augmented Self-Training [66.80558875393565]
We study the problem of training named entity recognition (NER) models using only distantly-labeled data. We propose a noise-robust learning scheme comprised of a new loss function and a noisy label removal step. Our method achieves superior performance, outperforming existing distantly-supervised NER models by significant margins.
arXiv Detail & Related papers (2021-09-10T17:19:56Z)
Membership Inference on Word Embedding and Beyond [17.202696286248294]
We show that word embeddings are vulnerable to black-box membership inference attacks under realistic assumptions. We also show that this leakage persists through two other major NLP applications: classification and text-generation. Our attack is a cheaper membership inference attack on text-generative models.
arXiv Detail & Related papers (2021-06-21T19:37:06Z)
Black-box Model Inversion Attribute Inference Attacks on Classification Models [32.757792981935815]
We focus on one kind of model inversion attacks, where the adversary knows non-sensitive attributes about instances in the training data. We devise two novel model inversion attribute inference attacks -- confidence modeling-based attack and confidence score-based attack. We evaluate our attacks on two types of machine learning models, decision tree and deep neural network, trained with two real datasets.
arXiv Detail & Related papers (2020-12-07T01:14:19Z)
Sampling Attacks: Amplification of Membership Inference Attacks by Repeated Queries [74.59376038272661]
We introduce sampling attack, a novel membership inference technique that unlike other standard membership adversaries is able to work under severe restriction of no access to scores of the victim model. We show that a victim model that only publishes the labels is still susceptible to sampling attacks and the adversary can recover up to 100% of its performance. For defense, we choose differential privacy in the form of gradient perturbation during the training of the victim model as well as output perturbation at prediction time.
arXiv Detail & Related papers (2020-09-01T12:54:54Z)
Adversarial Self-Supervised Contrastive Learning [62.17538130778111]
Existing adversarial learning approaches mostly use class labels to generate adversarial samples that lead to incorrect predictions. We propose a novel adversarial attack for unlabeled data, which makes the model confuse the instance-level identities of the perturbed data samples. We present a self-supervised contrastive learning framework to adversarially train a robust neural network without labeled data.
arXiv Detail & Related papers (2020-06-13T08:24:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.