Survey: Leakage and Privacy at Inference Time
- URL: http://arxiv.org/abs/2107.01614v1
- Date: Sun, 4 Jul 2021 12:59:16 GMT
- Title: Survey: Leakage and Privacy at Inference Time
- Authors: Marija Jegorova, Chaitanya Kaul, Charlie Mayor, Alison Q. O'Neil,
Alexander Weir, Roderick Murray-Smith, and Sotirios A. Tsaftaris
- Abstract summary: Leakage of data from publicly available Machine Learning (ML) models is an area of growing significance.
We focus on inference-time leakage, as the most likely scenario for publicly available models.
We propose a taxonomy across involuntary and malevolent leakage, available defences, followed by the currently available assessment metrics and applications.
- Score: 59.957056214792665
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Leakage of data from publicly available Machine Learning (ML) models is an
area of growing significance as commercial and government applications of ML
can draw on multiple sources of data, potentially including users' and clients'
sensitive data. We provide a comprehensive survey of contemporary advances on
several fronts, covering involuntary data leakage which is natural to ML
models, potential malevolent leakage which is caused by privacy attacks, and
currently available defence mechanisms. We focus on inference-time leakage, as
the most likely scenario for publicly available models. We first discuss what
leakage is in the context of different data, tasks, and model architectures. We
then propose a taxonomy across involuntary and malevolent leakage, available
defences, followed by the currently available assessment metrics and
applications. We conclude with outstanding challenges and open questions,
outlining some promising directions for future research.
Related papers
- Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models [112.48136829374741]
In this paper, we unveil a new vulnerability: the privacy backdoor attack.
When a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model.
Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.
arXiv Detail & Related papers (2024-04-01T16:50:54Z) - The Good and The Bad: Exploring Privacy Issues in Retrieval-Augmented
Generation (RAG) [56.67603627046346]
Retrieval-augmented generation (RAG) is a powerful technique to facilitate language model with proprietary and private data.
In this work, we conduct empirical studies with novel attack methods, which demonstrate the vulnerability of RAG systems on leaking the private retrieval database.
arXiv Detail & Related papers (2024-02-23T18:35:15Z) - Where have you been? A Study of Privacy Risk for Point-of-Interest Recommendation [20.526071564917274]
Mobility data can be used to build machine learning (ML) models for location-based services (LBS)
However, the convenience comes with the risk of privacy leakage since this type of data might contain sensitive information related to user identities, such as home/work locations.
We design a privacy attack suite containing data extraction and membership inference attacks tailored for point-of-interest (POI) recommendation models.
arXiv Detail & Related papers (2023-10-28T06:17:52Z) - Assessing Privacy Risks in Language Models: A Case Study on
Summarization Tasks [65.21536453075275]
We focus on the summarization task and investigate the membership inference (MI) attack.
We exploit text similarity and the model's resistance to document modifications as potential MI signals.
We discuss several safeguards for training summarization models to protect against MI attacks and discuss the inherent trade-off between privacy and utility.
arXiv Detail & Related papers (2023-10-20T05:44:39Z) - A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing.
Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data.
Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z) - A Blackbox Model Is All You Need to Breach Privacy: Smart Grid
Forecasting Models as a Use Case [0.7714988183435832]
We show that a black box access to an LSTM model can reveal a significant amount of information equivalent to having access to the data itself.
This highlights the importance of protecting forecasting models at the same level as the data.
arXiv Detail & Related papers (2023-09-04T11:07:37Z) - Membership Inference Attacks against Synthetic Data through Overfitting
Detection [84.02632160692995]
We argue for a realistic MIA setting that assumes the attacker has some knowledge of the underlying data distribution.
We propose DOMIAS, a density-based MIA model that aims to infer membership by targeting local overfitting of the generative model.
arXiv Detail & Related papers (2023-02-24T11:27:39Z) - Survey on the Convergence of Machine Learning and Blockchain [4.45999674917158]
Machine learning (ML) has been pervasively researched nowadays and it has been applied in many aspects of real life.
However, issues of model and data still accompany the development of ML.
With the utilization of blockchain, these problems can be efficiently solved.
arXiv Detail & Related papers (2022-01-04T04:47:45Z) - Privacy in Deep Learning: A Survey [16.278779275923448]
The ever-growing advances of deep learning in many areas have led to the adoption of Deep Neural Networks (DNNs) in production systems.
The availability of large datasets and high computational power are the main contributors to these advances.
This poses serious privacy concerns as this data can be misused or leaked through various vulnerabilities.
arXiv Detail & Related papers (2020-04-25T23:47:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.