Related papers: Quantifying Membership Inference Vulnerability via Generalization Gap and Other Model Metrics

Quantifying Membership Inference Vulnerability via Generalization Gap and Other Model Metrics

URL: http://arxiv.org/abs/2009.05669v1
Date: Fri, 11 Sep 2020 21:53:50 GMT
Title: Quantifying Membership Inference Vulnerability via Generalization Gap and Other Model Metrics
Authors: Jason W. Bentley, Daniel Gibney, Gary Hoppenworth, Sumit Kumar Jha
Abstract summary: We show how a target model's generalization gap leads directly to an effective deterministic black box membership inference attack (MIA) This attack is shown to be optimal in the expected sense given access to only certain likely obtainable metrics regarding the network's training and performance.
Score: 4.416432468665362
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We demonstrate how a target model's generalization gap leads directly to an effective deterministic black box membership inference attack (MIA). This provides an upper bound on how secure a model can be to MIA based on a simple metric. Moreover, this attack is shown to be optimal in the expected sense given access to only certain likely obtainable metrics regarding the network's training and performance. Experimentally, this attack is shown to be comparable in accuracy to state-of-art MIAs in many cases.

Related papers

Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode. We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z)
Practical Membership Inference Attacks against Fine-tuned Large Language Models via Self-prompt Calibration [32.15773300068426]
Membership Inference Attacks aim to infer whether a target data record has been utilized for model training. We propose a Membership Inference Attack based on Self-calibrated Probabilistic Variation (SPV-MIA)
arXiv Detail & Related papers (2023-11-10T13:55:05Z)
When Fairness Meets Privacy: Exploring Privacy Threats in Fair Binary Classifiers via Membership Inference Attacks [17.243744418309593]
We propose an efficient MIA method against fairness-enhanced models based on fairness discrepancy results. We also explore potential strategies for mitigating privacy leakages.
arXiv Detail & Related papers (2023-11-07T10:28:17Z)
Practical Membership Inference Attacks Against Large-Scale Multi-Modal Models: A Pilot Study [17.421886085918608]
Membership inference attacks (MIAs) aim to infer whether a data point has been used to train a machine learning model. These attacks can be employed to identify potential privacy vulnerabilities and detect unauthorized use of personal data. This paper takes a first step towards developing practical MIAs against large-scale multi-modal models.
arXiv Detail & Related papers (2023-09-29T19:38:40Z)
Unstoppable Attack: Label-Only Model Inversion via Conditional Diffusion Model [14.834360664780709]
Model attacks (MIAs) aim to recover private data from inaccessible training sets of deep learning models. This paper develops a novel MIA method, leveraging a conditional diffusion model (CDM) to recover representative samples under the target label. Experimental results show that this method can generate similar and accurate samples to the target label, outperforming generators of previous approaches.
arXiv Detail & Related papers (2023-07-17T12:14:24Z)
MF-CLIP: Leveraging CLIP as Surrogate Models for No-box Adversarial Attacks [65.86360607693457]
No-box attacks, where adversaries have no prior knowledge, remain relatively underexplored despite its practical relevance. This work presents a systematic investigation into leveraging large-scale Vision-Language Models (VLMs) as surrogate models for executing no-box attacks. Our theoretical and empirical analyses reveal a key limitation in the execution of no-box attacks stemming from insufficient discriminative capabilities for direct application of vanilla CLIP as a surrogate model. We propose MF-CLIP: a novel framework that enhances CLIP's effectiveness as a surrogate model through margin-aware feature space optimization.
arXiv Detail & Related papers (2023-07-13T08:10:48Z)
Exploring validation metrics for offline model-based optimisation with diffusion models [50.404829846182764]
In model-based optimisation (MBO) we are interested in using machine learning to design candidates that maximise some measure of reward with respect to a black box function called the (ground truth) oracle. While an approximation to the ground oracle can be trained and used in place of it during model validation to measure the mean reward over generated candidates, the evaluation is approximate and vulnerable to adversarial examples. This is encapsulated under our proposed evaluation framework which is also designed to measure extrapolation.
arXiv Detail & Related papers (2022-11-19T16:57:37Z)
RelaxLoss: Defending Membership Inference Attacks without Losing Utility [68.48117818874155]
We propose a novel training framework based on a relaxed loss with a more achievable learning target. RelaxLoss is applicable to any classification model with added benefits of easy implementation and negligible overhead. Our approach consistently outperforms state-of-the-art defense mechanisms in terms of resilience against MIAs.
arXiv Detail & Related papers (2022-07-12T19:34:47Z)
Training Meta-Surrogate Model for Transferable Adversarial Attack [98.13178217557193]
We consider adversarial attacks to a black-box model when no queries are allowed. In this setting, many methods directly attack surrogate models and transfer the obtained adversarial examples to fool the target model. We show we can obtain a Meta-Surrogate Model (MSM) such that attacks to this model can be easier transferred to other models.
arXiv Detail & Related papers (2021-09-05T03:27:46Z)
Trust but Verify: Assigning Prediction Credibility by Counterfactual Constrained Learning [123.3472310767721]
Prediction credibility measures are fundamental in statistics and machine learning. These measures should account for the wide variety of models used in practice. The framework developed in this work expresses the credibility as a risk-fit trade-off.
arXiv Detail & Related papers (2020-11-24T19:52:38Z)
Boosting Black-Box Attack with Partially Transferred Conditional Adversarial Distribution [83.02632136860976]
We study black-box adversarial attacks against deep neural networks (DNNs) We develop a novel mechanism of adversarial transferability, which is robust to the surrogate biases. Experiments on benchmark datasets and attacking against real-world API demonstrate the superior attack performance of the proposed method.
arXiv Detail & Related papers (2020-06-15T16:45:27Z)
Membership Inference Attacks and Defenses in Classification Models [19.498313593713043]
We study the membership inference (MI) attack against classifiers. We find that a model's vulnerability to MI attacks is tightly related to the generalization gap. We propose a defense against MI attacks that aims to close the gap by intentionally reducing the training accuracy.
arXiv Detail & Related papers (2020-02-27T12:35:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.