Survey: Leakage and Privacy at Inference Time
- URL: http://arxiv.org/abs/2107.01614v1
- Date: Sun, 4 Jul 2021 12:59:16 GMT
- Title: Survey: Leakage and Privacy at Inference Time
- Authors: Marija Jegorova, Chaitanya Kaul, Charlie Mayor, Alison Q. O'Neil,
Alexander Weir, Roderick Murray-Smith, and Sotirios A. Tsaftaris
- Abstract summary: Leakage of data from publicly available Machine Learning (ML) models is an area of growing significance.
We focus on inference-time leakage, as the most likely scenario for publicly available models.
We propose a taxonomy across involuntary and malevolent leakage, available defences, followed by the currently available assessment metrics and applications.
- Score: 59.957056214792665
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Leakage of data from publicly available Machine Learning (ML) models is an
area of growing significance as commercial and government applications of ML
can draw on multiple sources of data, potentially including users' and clients'
sensitive data. We provide a comprehensive survey of contemporary advances on
several fronts, covering involuntary data leakage which is natural to ML
models, potential malevolent leakage which is caused by privacy attacks, and
currently available defence mechanisms. We focus on inference-time leakage, as
the most likely scenario for publicly available models. We first discuss what
leakage is in the context of different data, tasks, and model architectures. We
then propose a taxonomy across involuntary and malevolent leakage, available
defences, followed by the currently available assessment metrics and
applications. We conclude with outstanding challenges and open questions,
outlining some promising directions for future research.
Related papers
- FT-PrivacyScore: Personalized Privacy Scoring Service for Machine Learning Participation [4.772368796656325]
In practice, controlled data access remains a mainstream method for protecting data privacy in many industrial and research environments.
We developed the demo prototype FT-PrivacyScore to show that it's possible to efficiently and quantitatively estimate the privacy risk of participating in a model fine-tuning task.
arXiv Detail & Related papers (2024-10-30T02:41:26Z) - Forget to Flourish: Leveraging Machine-Unlearning on Pretrained Language Models for Privacy Leakage [12.892449128678516]
Fine-tuning language models on private data for downstream applications poses significant privacy risks.
Several popular community platforms now offer convenient distribution of a large variety of pre-trained models.
We introduce a novel poisoning technique that uses model-unlearning as an attack tool.
arXiv Detail & Related papers (2024-08-30T15:35:09Z) - Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models [112.48136829374741]
In this paper, we unveil a new vulnerability: the privacy backdoor attack.
When a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model.
Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.
arXiv Detail & Related papers (2024-04-01T16:50:54Z) - Where have you been? A Study of Privacy Risk for Point-of-Interest Recommendation [20.526071564917274]
Mobility data can be used to build machine learning (ML) models for location-based services (LBS)
However, the convenience comes with the risk of privacy leakage since this type of data might contain sensitive information related to user identities, such as home/work locations.
We design a privacy attack suite containing data extraction and membership inference attacks tailored for point-of-interest (POI) recommendation models.
arXiv Detail & Related papers (2023-10-28T06:17:52Z) - Assessing Privacy Risks in Language Models: A Case Study on
Summarization Tasks [65.21536453075275]
We focus on the summarization task and investigate the membership inference (MI) attack.
We exploit text similarity and the model's resistance to document modifications as potential MI signals.
We discuss several safeguards for training summarization models to protect against MI attacks and discuss the inherent trade-off between privacy and utility.
arXiv Detail & Related papers (2023-10-20T05:44:39Z) - Privacy in Large Language Models: Attacks, Defenses and Future Directions [84.73301039987128]
We analyze the current privacy attacks targeting large language models (LLMs) and categorize them according to the adversary's assumed capabilities.
We present a detailed overview of prominent defense strategies that have been developed to counter these privacy attacks.
arXiv Detail & Related papers (2023-10-16T13:23:54Z) - A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing.
Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data.
Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z) - A Blackbox Model Is All You Need to Breach Privacy: Smart Grid
Forecasting Models as a Use Case [0.7714988183435832]
We show that a black box access to an LSTM model can reveal a significant amount of information equivalent to having access to the data itself.
This highlights the importance of protecting forecasting models at the same level as the data.
arXiv Detail & Related papers (2023-09-04T11:07:37Z) - Membership Inference Attacks against Synthetic Data through Overfitting
Detection [84.02632160692995]
We argue for a realistic MIA setting that assumes the attacker has some knowledge of the underlying data distribution.
We propose DOMIAS, a density-based MIA model that aims to infer membership by targeting local overfitting of the generative model.
arXiv Detail & Related papers (2023-02-24T11:27:39Z) - Privacy in Deep Learning: A Survey [16.278779275923448]
The ever-growing advances of deep learning in many areas have led to the adoption of Deep Neural Networks (DNNs) in production systems.
The availability of large datasets and high computational power are the main contributors to these advances.
This poses serious privacy concerns as this data can be misused or leaked through various vulnerabilities.
arXiv Detail & Related papers (2020-04-25T23:47:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.