Privacy Leakage in Text Classification: A Data Extraction Approach
- URL: http://arxiv.org/abs/2206.04591v1
- Date: Thu, 9 Jun 2022 16:14:26 GMT
- Title: Privacy Leakage in Text Classification: A Data Extraction Approach
- Authors: Adel Elmahdy, Huseyin A. Inan, Robert Sim
- Abstract summary: We study the potential privacy leakage in the text classification domain by investigating the problem of unintended memorization of training data.
We propose an algorithm to extract missing tokens of a partial text by exploiting the likelihood of the class label provided by the model.
- Score: 9.045332526072828
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent work has demonstrated the successful extraction of training data from
generative language models. However, it is not evident whether such extraction
is feasible in text classification models since the training objective is to
predict the class label as opposed to next-word prediction. This poses an
interesting challenge and raises an important question regarding the privacy of
training data in text classification settings. Therefore, we study the
potential privacy leakage in the text classification domain by investigating
the problem of unintended memorization of training data that is not pertinent
to the learning task. We propose an algorithm to extract missing tokens of a
partial text by exploiting the likelihood of the class label provided by the
model. We test the effectiveness of our algorithm by inserting canaries into
the training set and attempting to extract tokens in these canaries
post-training. In our experiments, we demonstrate that successful extraction is
possible to some extent. This can also be used as an auditing strategy to
assess any potential unauthorized use of personal data without consent.
Related papers
- Adaptive Pre-training Data Detection for Large Language Models via Surprising Tokens [1.2549198550400134]
Large language models (LLMs) are extensively used, but there are concerns regarding privacy, security, and copyright due to their opaque training data.
Current solutions to this problem leverage techniques explored in machine learning privacy such as Membership Inference Attacks (MIAs)
We propose an adaptive pre-training data detection method which alleviates this reliance and effectively amplify the identification.
arXiv Detail & Related papers (2024-07-30T23:43:59Z) - Protecting Privacy in Classifiers by Token Manipulation [3.5033860596797965]
We focus on text classification models, examining various token mapping and contextualized manipulation functions.
We find that although some token mapping functions are easy and straightforward to implement, they heavily influence performance on the downstream task.
In comparison, the contextualized manipulation provides an improvement in performance.
arXiv Detail & Related papers (2024-07-01T14:41:59Z) - Self-Training for Sample-Efficient Active Learning for Text Classification with Pre-Trained Language Models [3.546617486894182]
We introduce HAST, a new and effective self-training strategy, which is evaluated on four text classification benchmarks.
Results show that it outperforms the reproduced self-training approaches and reaches classification results comparable to previous experiments for three out of four datasets.
arXiv Detail & Related papers (2024-06-13T15:06:11Z) - Self-supervised Pre-training of Text Recognizers [0.0]
We study self-supervised pre-training methods based on masked label prediction.
We perform experiments on historical handwritten (Bentham) and historical printed datasets.
The evaluation shows that the self-supervised pre-training on data from the target domain is very effective, but it struggles to outperform transfer learning from closely related domains.
arXiv Detail & Related papers (2024-05-01T09:58:57Z) - Detecting Pretraining Data from Large Language Models [90.12037980837738]
We study the pretraining data detection problem.
Given a piece of text and black-box access to an LLM without knowing the pretraining data, can we determine if the model was trained on the provided text?
We introduce a new detection method Min-K% Prob based on a simple hypothesis.
arXiv Detail & Related papers (2023-10-25T17:21:23Z) - XAL: EXplainable Active Learning Makes Classifiers Better Low-resource Learners [71.8257151788923]
We propose a novel Explainable Active Learning framework (XAL) for low-resource text classification.
XAL encourages classifiers to justify their inferences and delve into unlabeled data for which they cannot provide reasonable explanations.
Experiments on six datasets show that XAL achieves consistent improvement over 9 strong baselines.
arXiv Detail & Related papers (2023-10-09T08:07:04Z) - Ethicist: Targeted Training Data Extraction Through Loss Smoothed Soft
Prompting and Calibrated Confidence Estimation [56.57532238195446]
We propose a method named Ethicist for targeted training data extraction.
To elicit memorization, we tune soft prompt embeddings while keeping the model fixed.
We show that Ethicist significantly improves the extraction performance on a recently proposed public benchmark.
arXiv Detail & Related papers (2023-07-10T08:03:41Z) - Bag of Tricks for Training Data Extraction from Language Models [98.40637430115204]
We investigate and benchmark tricks for improving training data extraction using a publicly available dataset.
The experimental results show that several previously overlooked tricks can be crucial to the success of training data extraction.
arXiv Detail & Related papers (2023-02-09T06:46:42Z) - Combining Feature and Instance Attribution to Detect Artifacts [62.63504976810927]
We propose methods to facilitate identification of training data artifacts.
We show that this proposed training-feature attribution approach can be used to uncover artifacts in training data.
We execute a small user study to evaluate whether these methods are useful to NLP researchers in practice.
arXiv Detail & Related papers (2021-07-01T09:26:13Z) - Self-training Improves Pre-training for Natural Language Understanding [63.78927366363178]
We study self-training as another way to leverage unlabeled data through semi-supervised learning.
We introduce SentAugment, a data augmentation method which computes task-specific query embeddings from labeled data.
Our approach leads to scalable and effective self-training with improvements of up to 2.6% on standard text classification benchmarks.
arXiv Detail & Related papers (2020-10-05T17:52:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.