Neighborhood Blending: A Lightweight Inference-Time Defense Against Membership Inference Attacks
- URL: http://arxiv.org/abs/2602.12943v1
- Date: Fri, 13 Feb 2026 14:01:21 GMT
- Title: Neighborhood Blending: A Lightweight Inference-Time Defense Against Membership Inference Attacks
- Authors: Osama Zafar, Shaojie Zhan, Tianxi Ji, Erman Ayday,
- Abstract summary: We introduce a novel inference-time defense mechanism called Neighborhood Blending.<n>Our method establishes a consistent confidence pattern, rendering members and non-members indistinguishable to an adversary.<n>It is a model-agnostic approach that offers a practical, lightweight solution that enhances privacy without sacrificing model utility.
- Score: 5.468130838517792
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In recent years, the widespread adoption of Machine Learning as a Service (MLaaS), particularly in sensitive environments, has raised considerable privacy concerns. Of particular importance are membership inference attacks (MIAs), which exploit behavioral discrepancies between training and non-training data to determine whether a specific record was included in the model's training set, thereby presenting significant privacy risks. Although existing defenses, such as adversarial regularization, DP-SGD, and MemGuard, assist in mitigating these threats, they often entail trade-offs such as compromising utility, increased computational requirements, or inconsistent protection against diverse attack vectors. In this paper, we introduce a novel inference-time defense mechanism called Neighborhood Blending, which mitigates MIAs without retraining the model or incurring significant computational overhead. Our approach operates post-training by smoothing the model's confidence outputs based on the neighborhood of a queried sample. By averaging predictions from similar training samples selected using differentially private sampling, our method establishes a consistent confidence pattern, rendering members and non-members indistinguishable to an adversary while maintaining high utility. Significantly, Neighborhood Blending maintains label integrity (zero label loss) and ensures high utility through an adaptive, "pay-as-you-go" distortion strategy. It is a model-agnostic approach that offers a practical, lightweight solution that enhances privacy without sacrificing model utility. Through extensive experiments across diverse datasets and models, we demonstrate that our defense significantly reduces MIA success rates while preserving model performance, outperforming existing post-hoc defenses like MemGuard and training-time techniques like DP-SGD in terms of utility retention.
Related papers
- GShield: Mitigating Poisoning Attacks in Federated Learning [2.6260952524631787]
Federated Learning (FL) has recently emerged as a revolutionary approach to collaborative training Machine Learning models.<n>It enables decentralized model training while preserving data privacy, but its distributed nature makes it highly vulnerable to a severe attack known as Data Poisoning.<n>We present a novel defense mechanism called GShield, designed to detect and mitigate malicious and low-quality updates.
arXiv Detail & Related papers (2025-12-22T11:29:28Z) - Differential Privacy: Gradient Leakage Attacks in Federated Learning Environments [0.6850683267295249]
Federated Learning (FL) allows for the training of Machine Learning models in a collaborative manner without the need to share sensitive data.<n>It remains vulnerable to Gradient Leakage Attacks (GLAs), which can reveal private information from the shared model updates.<n>We investigate the effectiveness of Differential Privacy mechanisms as defenses against GLAs.
arXiv Detail & Related papers (2025-10-27T23:33:21Z) - MISLEADER: Defending against Model Extraction with Ensembles of Distilled Models [56.09354775405601]
Model extraction attacks aim to replicate the functionality of a black-box model through query access.<n>Most existing defenses presume that attacker queries have out-of-distribution (OOD) samples, enabling them to detect and disrupt suspicious inputs.<n>We propose MISLEADER, a novel defense strategy that does not rely on OOD assumptions.
arXiv Detail & Related papers (2025-06-03T01:37:09Z) - AdaMixup: A Dynamic Defense Framework for Membership Inference Attack Mitigation [3.593261887572616]
Membership inference attacks have emerged as a significant privacy concern in the training of deep learning models.<n>AdaMixup employs adaptive mixup techniques to enhance the model's robustness against membership inference attacks.
arXiv Detail & Related papers (2025-01-04T04:21:48Z) - Efficient Adversarial Training in LLMs with Continuous Attacks [99.5882845458567]
Large language models (LLMs) are vulnerable to adversarial attacks that can bypass their safety guardrails.
We propose a fast adversarial training algorithm (C-AdvUL) composed of two losses.
C-AdvIPO is an adversarial variant of IPO that does not require utility data for adversarially robust alignment.
arXiv Detail & Related papers (2024-05-24T14:20:09Z) - Overconfidence is a Dangerous Thing: Mitigating Membership Inference
Attacks by Enforcing Less Confident Prediction [2.2336243882030025]
Machine learning models are vulnerable to membership inference attacks (MIAs)
This work proposes a defense technique, HAMP, that can achieve both strong membership privacy and high accuracy, without requiring extra data.
arXiv Detail & Related papers (2023-07-04T09:50:33Z) - Avoid Adversarial Adaption in Federated Learning by Multi-Metric
Investigations [55.2480439325792]
Federated Learning (FL) facilitates decentralized machine learning model training, preserving data privacy, lowering communication costs, and boosting model performance through diversified data sources.
FL faces vulnerabilities such as poisoning attacks, undermining model integrity with both untargeted performance degradation and targeted backdoor attacks.
We define a new notion of strong adaptive adversaries, capable of adapting to multiple objectives simultaneously.
MESAS is the first defense robust against strong adaptive adversaries, effective in real-world data scenarios, with an average overhead of just 24.37 seconds.
arXiv Detail & Related papers (2023-06-06T11:44:42Z) - Effective Targeted Attacks for Adversarial Self-Supervised Learning [58.14233572578723]
unsupervised adversarial training (AT) has been highlighted as a means of achieving robustness in models without any label information.
We propose a novel positive mining for targeted adversarial attack to generate effective adversaries for adversarial SSL frameworks.
Our method demonstrates significant enhancements in robustness when applied to non-contrastive SSL frameworks, and less but consistent robustness improvements with contrastive SSL frameworks.
arXiv Detail & Related papers (2022-10-19T11:43:39Z) - RelaxLoss: Defending Membership Inference Attacks without Losing Utility [68.48117818874155]
We propose a novel training framework based on a relaxed loss with a more achievable learning target.
RelaxLoss is applicable to any classification model with added benefits of easy implementation and negligible overhead.
Our approach consistently outperforms state-of-the-art defense mechanisms in terms of resilience against MIAs.
arXiv Detail & Related papers (2022-07-12T19:34:47Z) - Sampling Attacks: Amplification of Membership Inference Attacks by
Repeated Queries [74.59376038272661]
We introduce sampling attack, a novel membership inference technique that unlike other standard membership adversaries is able to work under severe restriction of no access to scores of the victim model.
We show that a victim model that only publishes the labels is still susceptible to sampling attacks and the adversary can recover up to 100% of its performance.
For defense, we choose differential privacy in the form of gradient perturbation during the training of the victim model as well as output perturbation at prediction time.
arXiv Detail & Related papers (2020-09-01T12:54:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.