Membership Inference Attacks Beyond Overfitting
- URL: http://arxiv.org/abs/2511.16792v1
- Date: Thu, 20 Nov 2025 20:40:56 GMT
- Title: Membership Inference Attacks Beyond Overfitting
- Authors: Mona Khalil, Alberto Blanco-Justicia, Najeeb Jebreel, Josep Domingo-Ferrer,
- Abstract summary: Membership inference attacks (MIAs) aim to determine whether a given data point was part of the model training data.<n>MIAs exploit differences in the behavior of a model when making predictions on samples it has seen during training.<n>Even non-overfitted ML models can leak information about a small subset of their training data.
- Score: 3.549717032380187
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Membership inference attacks (MIAs) against machine learning (ML) models aim to determine whether a given data point was part of the model training data. These attacks may pose significant privacy risks to individuals whose sensitive data were used for training, which motivates the use of defenses such as differential privacy, often at the cost of high accuracy losses. MIAs exploit the differences in the behavior of a model when making predictions on samples it has seen during training (members) versus those it has not seen (non-members). Several studies have pointed out that model overfitting is the major factor contributing to these differences in behavior and, consequently, to the success of MIAs. However, the literature also shows that even non-overfitted ML models can leak information about a small subset of their training data. In this paper, we investigate the root causes of membership inference vulnerabilities beyond traditional overfitting concerns and suggest targeted defenses. We empirically analyze the characteristics of the training data samples vulnerable to MIAs in models that are not overfitted (and hence able to generalize). Our findings reveal that these samples are often outliers within their classes (e.g., noisy or hard to classify). We then propose potential defensive strategies to protect these vulnerable samples and enhance the privacy-preserving capabilities of ML models. Our code is available at https://github.com/najeebjebreel/mia_analysis.
Related papers
- ImpMIA: Leveraging Implicit Bias for Membership Inference Attack under Realistic Scenarios [25.37906016731147]
We introduce ImpMIA, a Membership Inference Attack that exploits the Implicit Bias of neural networks.<n>ImpMIA uses the Karush-Kuhn-Tucker optimality conditions to identify training samples.
arXiv Detail & Related papers (2025-10-12T14:12:28Z) - Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models [112.48136829374741]
In this paper, we unveil a new vulnerability: the privacy backdoor attack.
When a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model.
Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.
arXiv Detail & Related papers (2024-04-01T16:50:54Z) - Inexact Unlearning Needs More Careful Evaluations to Avoid a False Sense of Privacy [45.413801663923564]
We discuss adaptations of Membership Inference Attacks (MIAs) to the setting of unlearning.
We show that the commonly used U-MIAs in the unlearning literature overestimate the privacy protection afforded by existing unlearning techniques on both vision and language models.
arXiv Detail & Related papers (2024-03-02T14:22:40Z) - Assessing Privacy Risks in Language Models: A Case Study on
Summarization Tasks [65.21536453075275]
We focus on the summarization task and investigate the membership inference (MI) attack.
We exploit text similarity and the model's resistance to document modifications as potential MI signals.
We discuss several safeguards for training summarization models to protect against MI attacks and discuss the inherent trade-off between privacy and utility.
arXiv Detail & Related papers (2023-10-20T05:44:39Z) - Beyond Labeling Oracles: What does it mean to steal ML models? [52.63413852460003]
Model extraction attacks are designed to steal trained models with only query access.
We investigate factors influencing the success of model extraction attacks.
Our findings urge the community to redefine the adversarial goals of ME attacks.
arXiv Detail & Related papers (2023-10-03T11:10:21Z) - Membership Inference Attacks against Synthetic Data through Overfitting
Detection [84.02632160692995]
We argue for a realistic MIA setting that assumes the attacker has some knowledge of the underlying data distribution.
We propose DOMIAS, a density-based MIA model that aims to infer membership by targeting local overfitting of the generative model.
arXiv Detail & Related papers (2023-02-24T11:27:39Z) - RelaxLoss: Defending Membership Inference Attacks without Losing Utility [68.48117818874155]
We propose a novel training framework based on a relaxed loss with a more achievable learning target.
RelaxLoss is applicable to any classification model with added benefits of easy implementation and negligible overhead.
Our approach consistently outperforms state-of-the-art defense mechanisms in terms of resilience against MIAs.
arXiv Detail & Related papers (2022-07-12T19:34:47Z) - Do Not Trust Prediction Scores for Membership Inference Attacks [15.567057178736402]
Membership inference attacks (MIAs) aim to determine whether a specific sample was used to train a predictive model.
We argue that this is a fallacy for many modern deep network architectures.
We are able to produce a potentially infinite number of samples falsely classified as part of the training data.
arXiv Detail & Related papers (2021-11-17T12:39:04Z) - Investigating Membership Inference Attacks under Data Dependencies [26.70764798408236]
Training machine learning models on privacy-sensitive data has opened the door to new attacks that can have serious privacy implications.
One such attack, the Membership Inference Attack (MIA), exposes whether or not a particular data point was used to train a model.
We evaluate the defence under the restrictive assumption that all members of the training set, as well as non-members, are independent and identically distributed.
arXiv Detail & Related papers (2020-10-23T00:16:46Z) - Knowledge-Enriched Distributional Model Inversion Attacks [49.43828150561947]
Model inversion (MI) attacks are aimed at reconstructing training data from model parameters.
We present a novel inversion-specific GAN that can better distill knowledge useful for performing attacks on private models from public data.
Our experiments show that the combination of these techniques can significantly boost the success rate of the state-of-the-art MI attacks by 150%.
arXiv Detail & Related papers (2020-10-08T16:20:48Z) - Trade-offs between membership privacy & adversarially robust learning [13.37805637358556]
We identify settings where standard models will overfit to a larger extent in comparison to robust models.
The degree of overfitting naturally depends on the amount of data available for training.
arXiv Detail & Related papers (2020-06-08T14:20:12Z) - Data and Model Dependencies of Membership Inference Attack [13.951470844348899]
We provide an empirical analysis of the impact of both the data and ML model properties on the vulnerability of ML techniques to MIA.
Our results reveal the relationship between MIA accuracy and properties of the dataset and training model in use.
We propose using those data and model properties as regularizers to protect ML models against MIA.
arXiv Detail & Related papers (2020-02-17T09:35:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.