Ethicist: Targeted Training Data Extraction Through Loss Smoothed Soft
Prompting and Calibrated Confidence Estimation
- URL: http://arxiv.org/abs/2307.04401v1
- Date: Mon, 10 Jul 2023 08:03:41 GMT
- Title: Ethicist: Targeted Training Data Extraction Through Loss Smoothed Soft
Prompting and Calibrated Confidence Estimation
- Authors: Zhexin Zhang, Jiaxin Wen, Minlie Huang
- Abstract summary: We propose a method named Ethicist for targeted training data extraction.
To elicit memorization, we tune soft prompt embeddings while keeping the model fixed.
We show that Ethicist significantly improves the extraction performance on a recently proposed public benchmark.
- Score: 56.57532238195446
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large pre-trained language models achieve impressive results across many
tasks. However, recent works point out that pre-trained language models may
memorize a considerable fraction of their training data, leading to the privacy
risk of information leakage. In this paper, we propose a method named Ethicist
for targeted training data extraction through loss smoothed soft prompting and
calibrated confidence estimation, investigating how to recover the suffix in
the training data when given a prefix. To elicit memorization in the attacked
model, we tune soft prompt embeddings while keeping the model fixed. We further
propose a smoothing loss that smooths the loss distribution of the suffix
tokens to make it easier to sample the correct suffix. In order to select the
most probable suffix from a collection of sampled suffixes and estimate the
prediction confidence, we propose a calibrated confidence estimation method,
which normalizes the confidence of the generated suffixes with a local
estimation. We show that Ethicist significantly improves the extraction
performance on a recently proposed public benchmark. We also investigate
several factors influencing the data extraction performance, including decoding
strategy, model scale, prefix length, and suffix length. Our code is available
at https://github.com/thu-coai/Targeted-Data-Extraction.
Related papers
- Adaptive Pre-training Data Detection for Large Language Models via Surprising Tokens [1.2549198550400134]
Large language models (LLMs) are extensively used, but there are concerns regarding privacy, security, and copyright due to their opaque training data.
Current solutions to this problem leverage techniques explored in machine learning privacy such as Membership Inference Attacks (MIAs)
We propose an adaptive pre-training data detection method which alleviates this reliance and effectively amplify the identification.
arXiv Detail & Related papers (2024-07-30T23:43:59Z) - Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions.
We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance.
Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - Emergent and Predictable Memorization in Large Language Models [23.567027014457775]
Memorization, or the tendency of large language models to output entire sequences from their training data verbatim, is a key concern for safely deploying language models.
We seek to predict which sequences will be memorized before a large model's full train-time by extrapolating the memorization behavior of lower-compute trial runs.
We provide further novel discoveries on the distribution of memorization scores across models and data.
arXiv Detail & Related papers (2023-04-21T17:58:31Z) - Reconstructing Training Data from Model Gradient, Provably [68.21082086264555]
We reconstruct the training samples from a single gradient query at a randomly chosen parameter value.
As a provable attack that reveals sensitive training data, our findings suggest potential severe threats to privacy.
arXiv Detail & Related papers (2022-12-07T15:32:22Z) - Checklist Models for Improved Output Fluency in Piano Fingering
Prediction [33.52847881359949]
We present a new approach for the task of predicting fingerings for piano music.
We put forward a checklist system, trained via reinforcement learning, that maintains a representation of recent predictions.
We demonstrate significant gains in performability directly attributable to improvements with respect to these metrics.
arXiv Detail & Related papers (2022-09-12T21:27:52Z) - Continual Learning for Fake Audio Detection [62.54860236190694]
This paper proposes detecting fake without forgetting, a continual-learning-based method, to make the model learn new spoofing attacks incrementally.
Experiments are conducted on the ASVspoof 2019 dataset.
arXiv Detail & Related papers (2021-04-15T07:57:05Z) - Semi-Supervised Learning for Sparsely-Labeled Sequential Data:
Application to Healthcare Video Processing [0.8312466807725921]
We propose a semi-supervised machine learning training strategy to improve event detection performance on sequential data.
Our method uses noisy guesses of the events' end times to train event detection models.
We show that our strategy outperforms conservative estimates by 12 points of mean average precision for MNIST, and 3.5 points for CIFAR.
arXiv Detail & Related papers (2020-11-28T09:54:44Z) - Uncertainty-aware Self-training for Text Classification with Few Labels [54.13279574908808]
We study self-training as one of the earliest semi-supervised learning approaches to reduce the annotation bottleneck.
We propose an approach to improve self-training by incorporating uncertainty estimates of the underlying neural network.
We show our methods leveraging only 20-30 labeled samples per class for each task for training and for validation can perform within 3% of fully supervised pre-trained language models.
arXiv Detail & Related papers (2020-06-27T08:13:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.