Assessing Privacy Risks in Language Models: A Case Study on
Summarization Tasks
- URL: http://arxiv.org/abs/2310.13291v1
- Date: Fri, 20 Oct 2023 05:44:39 GMT
- Title: Assessing Privacy Risks in Language Models: A Case Study on
Summarization Tasks
- Authors: Ruixiang Tang, Gord Lueck, Rodolfo Quispe, Huseyin A Inan, Janardhan
Kulkarni, Xia Hu
- Abstract summary: We focus on the summarization task and investigate the membership inference (MI) attack.
We exploit text similarity and the model's resistance to document modifications as potential MI signals.
We discuss several safeguards for training summarization models to protect against MI attacks and discuss the inherent trade-off between privacy and utility.
- Score: 65.21536453075275
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models have revolutionized the field of NLP by achieving
state-of-the-art performance on various tasks. However, there is a concern that
these models may disclose information in the training data. In this study, we
focus on the summarization task and investigate the membership inference (MI)
attack: given a sample and black-box access to a model's API, it is possible to
determine if the sample was part of the training data. We exploit text
similarity and the model's resistance to document modifications as potential MI
signals and evaluate their effectiveness on widely used datasets. Our results
demonstrate that summarization models are at risk of exposing data membership,
even in cases where the reference summary is not available. Furthermore, we
discuss several safeguards for training summarization models to protect against
MI attacks and discuss the inherent trade-off between privacy and utility.
Related papers
- Extracting Training Data from Document-Based VQA Models [67.1470112451617]
Vision-Language Models (VLMs) have made remarkable progress in document-based Visual Question Answering (i.e., responding to queries about the contents of an input document provided as an image)
We show these models can memorise responses for training samples and regurgitate them even when the relevant visual information has been removed.
This includes Personal Identifiable Information repeated once in the training set, indicating these models could divulge sensitive information and therefore pose a privacy risk.
arXiv Detail & Related papers (2024-07-11T17:44:41Z) - Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models [112.48136829374741]
In this paper, we unveil a new vulnerability: the privacy backdoor attack.
When a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model.
Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.
arXiv Detail & Related papers (2024-04-01T16:50:54Z) - PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind)
Our work offers a theoretical analysis for model design and benchmarks various techniques.
In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z) - Towards Better Modeling with Missing Data: A Contrastive Learning-based
Visual Analytics Perspective [7.577040836988683]
Missing data can pose a challenge for machine learning (ML) modeling.
Current approaches are categorized into feature imputation and label prediction.
This study proposes a Contrastive Learning framework to model observed data with missing values.
arXiv Detail & Related papers (2023-09-18T13:16:24Z) - Membership Inference Attacks against Language Models via Neighbourhood
Comparison [45.086816556309266]
Membership Inference attacks (MIAs) aim to predict whether a data sample was present in the training data of a machine learning model or not.
Recent work has demonstrated that reference-based attacks which compare model scores to those obtained from a reference model trained on similar data can substantially improve the performance of MIAs.
We investigate their performance in more realistic scenarios and find that they are highly fragile in relation to the data distribution used to train reference models.
arXiv Detail & Related papers (2023-05-29T07:06:03Z) - Membership Inference Attacks against Synthetic Data through Overfitting
Detection [84.02632160692995]
We argue for a realistic MIA setting that assumes the attacker has some knowledge of the underlying data distribution.
We propose DOMIAS, a density-based MIA model that aims to infer membership by targeting local overfitting of the generative model.
arXiv Detail & Related papers (2023-02-24T11:27:39Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - Leveraging Adversarial Examples to Quantify Membership Information
Leakage [30.55736840515317]
We develop a novel approach to address the problem of membership inference in pattern recognition models.
We argue that this quantity reflects the likelihood of belonging to the training data.
Our method performs comparable or even outperforms state-of-the-art strategies.
arXiv Detail & Related papers (2022-03-17T19:09:38Z) - Training Data Leakage Analysis in Language Models [6.843491191969066]
We introduce a methodology that investigates identifying the user content in the training data that could be leaked under a strong and realistic threat model.
We propose two metrics to quantify user-level data leakage by measuring a model's ability to produce unique sentence fragments within training data.
arXiv Detail & Related papers (2021-01-14T00:57:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.