Related papers: On the Privacy Effect of Data Enhancement via the Lens of Memorization

On the Privacy Effect of Data Enhancement via the Lens of Memorization

URL: http://arxiv.org/abs/2208.08270v4
Date: Sat, 23 Mar 2024 03:33:32 GMT
Title: On the Privacy Effect of Data Enhancement via the Lens of Memorization
Authors: Xiao Li, Qiongxiu Li, Zhanhao Hu, Xiaolin Hu,
Abstract summary: We propose to investigate privacy from a new perspective called memorization. Through the lens of memorization, we find that previously deployed MIAs produce misleading results as they are less likely to identify samples with higher privacy risks. We demonstrate that the generalization gap and privacy leakage are less correlated than those of the previous results.
Score: 20.63044895680223
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Machine learning poses severe privacy concerns as it has been shown that the learned models can reveal sensitive information about their training data. Many works have investigated the effect of widely adopted data augmentation and adversarial training techniques, termed data enhancement in the paper, on the privacy leakage of machine learning models. Such privacy effects are often measured by membership inference attacks (MIAs), which aim to identify whether a particular example belongs to the training set or not. We propose to investigate privacy from a new perspective called memorization. Through the lens of memorization, we find that previously deployed MIAs produce misleading results as they are less likely to identify samples with higher privacy risks as members compared to samples with low privacy risks. To solve this problem, we deploy a recent attack that can capture individual samples' memorization degrees for evaluation. Through extensive experiments, we unveil several findings about the connections between three essential properties of machine learning models, including privacy, generalization gap, and adversarial robustness. We demonstrate that the generalization gap and privacy leakage are less correlated than those of the previous results. Moreover, there is not necessarily a trade-off between adversarial robustness and privacy as stronger adversarial robustness does not make the model more susceptible to privacy attacks.

Related papers

Towards Benchmarking Privacy Vulnerabilities in Selective Forgetting with Large Language Models [28.389198065125314]
selective forgetting (also known as machine unlearning) has shown promise for privacy and data removal tasks.<n>Despite its promise, selective forgetting raises significant privacy concerns.<n>We present the first comprehensive benchmark for evaluating privacy vulnerabilities in selective forgetting.
arXiv Detail & Related papers (2025-12-19T20:04:06Z)
On the MIA Vulnerability Gap Between Private GANs and Diffusion Models [51.53790101362898]
Generative Adversarial Networks (GANs) and diffusion models have emerged as leading approaches for high-quality image synthesis.<n>We present the first unified theoretical and empirical analysis of the privacy risks faced by differentially private generative models.
arXiv Detail & Related papers (2025-09-03T14:18:22Z)
Spurious Privacy Leakage in Neural Networks [6.0067669060060584]
We introduce emphspurious privacy leakage, a phenomenon where spurious groups are significantly more vulnerable to privacy attacks than non-spurious groups.<n>Surprisingly, we find that reducing spurious correlation using spurious robust methods does not mitigate spurious privacy leakage.
arXiv Detail & Related papers (2025-05-26T15:04:39Z)
Membership Inference Attacks fueled by Few-Short Learning to detect privacy leakage tackling data integrity [7.8973037023478785]
Deep learning models memorize parts of their training data, creating a privacy leakage. We propose a Few-Shot learning based MIA, coined as the FeS-MIA model, which eases the evaluation of the privacy breach of a deep learning model. We also propose an interpretable quantitative and qualitative measure of privacy, referred to as Log-MIA measure.
arXiv Detail & Related papers (2025-03-12T13:09:43Z)
DeMem: Privacy-Enhanced Robust Adversarial Learning via De-Memorization [2.473007680791641]
Adversarial robustness is essential for ensuring the trustworthiness of machine learning models in real-world applications. Previous studies have shown that enhancing adversarial robustness through adversarial training increases vulnerability to privacy attacks. We propose DeMem, which selectively targets high-risk samples, achieving a better balance between privacy protection and model robustness.
arXiv Detail & Related papers (2024-12-08T00:22:58Z)
Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models [112.48136829374741]
In this paper, we unveil a new vulnerability: the privacy backdoor attack. When a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model. Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.
arXiv Detail & Related papers (2024-04-01T16:50:54Z)
$\alpha$-Mutual Information: A Tunable Privacy Measure for Privacy Protection in Data Sharing [4.475091558538915]
This paper adopts Arimoto's $alpha$-Mutual Information as a tunable privacy measure. We formulate a general distortion-based mechanism that manipulates the original data to offer privacy protection.
arXiv Detail & Related papers (2023-10-27T16:26:14Z)
Privacy Preserving Large Language Models: ChatGPT Case Study Based Vision and Framework [6.828884629694705]
This article proposes the conceptual model called PrivChatGPT, a privacy-generative model for LLMs. PrivChatGPT consists of two main components i.e., preserving user privacy during the data curation/pre-processing together with preserving private context and the private training process for large-scale data.
arXiv Detail & Related papers (2023-10-19T06:55:13Z)
PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind) Our work offers a theoretical analysis for model design and benchmarks various techniques. In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z)
Students Parrot Their Teachers: Membership Inference on Model Distillation [54.392069096234074]
We study the privacy provided by knowledge distillation to both the teacher and student training sets. Our attacks are strongest when student and teacher sets are similar, or when the attacker can poison the teacher set.
arXiv Detail & Related papers (2023-03-06T19:16:23Z)
The Privacy Onion Effect: Memorization is Relative [76.46529413546725]
We show an Onion Effect of memorization: removing the "layer" of outlier points that are most vulnerable exposes a new layer of previously-safe points to the same attack. It suggests that privacy-enhancing technologies such as machine unlearning could actually harm the privacy of other users.
arXiv Detail & Related papers (2022-06-21T15:25:56Z)
Quantifying and Mitigating Privacy Risks of Contrastive Learning [4.909548818641602]
We perform the first privacy analysis of contrastive learning through the lens of membership inference and attribute inference. Our results show that contrastive models are less vulnerable to membership inference attacks but more vulnerable to attribute inference attacks compared to supervised models. To remedy this situation, we propose the first privacy-preserving contrastive learning mechanism, namely Talos.
arXiv Detail & Related papers (2021-02-08T11:38:11Z)
Robustness Threats of Differential Privacy [70.818129585404]
We experimentally demonstrate that networks, trained with differential privacy, in some settings might be even more vulnerable in comparison to non-private versions. We study how the main ingredients of differentially private neural networks training, such as gradient clipping and noise addition, affect the robustness of the model.
arXiv Detail & Related papers (2020-12-14T18:59:24Z)
Sampling Attacks: Amplification of Membership Inference Attacks by Repeated Queries [74.59376038272661]
We introduce sampling attack, a novel membership inference technique that unlike other standard membership adversaries is able to work under severe restriction of no access to scores of the victim model. We show that a victim model that only publishes the labels is still susceptible to sampling attacks and the adversary can recover up to 100% of its performance. For defense, we choose differential privacy in the form of gradient perturbation during the training of the victim model as well as output perturbation at prediction time.
arXiv Detail & Related papers (2020-09-01T12:54:54Z)
Systematic Evaluation of Privacy Risks of Machine Learning Models [41.017707772150835]
We show that prior work on membership inference attacks may severely underestimate the privacy risks. We first propose to benchmark membership inference privacy risks by improving existing non-neural network based inference attacks. We then introduce a new approach for fine-grained privacy analysis by formulating and deriving a new metric called the privacy risk score.
arXiv Detail & Related papers (2020-03-24T00:53:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.