Towards a Game-theoretic Understanding of Explanation-based Membership Inference Attacks
- URL: http://arxiv.org/abs/2404.07139v1
- Date: Wed, 10 Apr 2024 16:14:05 GMT
- Title: Towards a Game-theoretic Understanding of Explanation-based Membership Inference Attacks
- Authors: Kavita Kumari, Murtuza Jadliwala, Sumit Kumar Jha, Anindya Maiti,
- Abstract summary: Black-box machine learning (ML) models can be exploited to carry out privacy threats such as membership inference attacks (MIA)
Existing works have only analyzed MIA in a single "what if" interaction scenario between an adversary and the target ML model.
We propose a sound mathematical formulation to prove that such an optimal threshold exists, which can be used to launch MIA.
- Score: 8.06071340190569
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Model explanations improve the transparency of black-box machine learning (ML) models and their decisions; however, they can also be exploited to carry out privacy threats such as membership inference attacks (MIA). Existing works have only analyzed MIA in a single "what if" interaction scenario between an adversary and the target ML model; thus, it does not discern the factors impacting the capabilities of an adversary in launching MIA in repeated interaction settings. Additionally, these works rely on assumptions about the adversary's knowledge of the target model's structure and, thus, do not guarantee the optimality of the predefined threshold required to distinguish the members from non-members. In this paper, we delve into the domain of explanation-based threshold attacks, where the adversary endeavors to carry out MIA attacks by leveraging the variance of explanations through iterative interactions with the system comprising of the target ML model and its corresponding explanation method. We model such interactions by employing a continuous-time stochastic signaling game framework. In our framework, an adversary plays a stopping game, interacting with the system (having imperfect information about the type of an adversary, i.e., honest or malicious) to obtain explanation variance information and computing an optimal threshold to determine the membership of a datapoint accurately. First, we propose a sound mathematical formulation to prove that such an optimal threshold exists, which can be used to launch MIA. Then, we characterize the conditions under which a unique Markov perfect equilibrium (or steady state) exists in this dynamic system. By means of a comprehensive set of simulations of the proposed game model, we assess different factors that can impact the capability of an adversary to launch MIA in such repeated interaction settings.
Related papers
- Transferable Adversarial Attacks on SAM and Its Downstream Models [87.23908485521439]
This paper explores the feasibility of adversarial attacking various downstream models fine-tuned from the segment anything model (SAM)
To enhance the effectiveness of the adversarial attack towards models fine-tuned on unknown datasets, we propose a universal meta-initialization (UMI) algorithm.
arXiv Detail & Related papers (2024-10-26T15:04:04Z) - Adversary-Augmented Simulation to evaluate fairness on HyperLedger Fabric [0.0]
This paper builds upon concepts such as adversarial assumptions, goals, and capabilities.
It classifies and constrains the use of adversarial actions based on classical distributed system models.
The objective is to study the effects of these allowed actions on the properties of protocols under various system models.
arXiv Detail & Related papers (2024-03-21T12:20:36Z) - On the Impact of Uncertainty and Calibration on Likelihood-Ratio Membership Inference Attacks [42.18575921329484]
We analyze the performance of the state-of-the-art likelihood ratio attack (LiRA) within an information-theoretical framework.
We derive bounds on the advantage of an MIA adversary with the aim of offering insights into the impact of uncertainty and calibration on the effectiveness of MIAs.
arXiv Detail & Related papers (2024-02-16T13:41:18Z) - FreqFed: A Frequency Analysis-Based Approach for Mitigating Poisoning
Attacks in Federated Learning [98.43475653490219]
Federated learning (FL) is susceptible to poisoning attacks.
FreqFed is a novel aggregation mechanism that transforms the model updates into the frequency domain.
We demonstrate that FreqFed can mitigate poisoning attacks effectively with a negligible impact on the utility of the aggregated model.
arXiv Detail & Related papers (2023-12-07T16:56:24Z) - JAB: Joint Adversarial Prompting and Belief Augmentation [81.39548637776365]
We introduce a joint framework in which we probe and improve the robustness of a black-box target model via adversarial prompting and belief augmentation.
This framework utilizes an automated red teaming approach to probe the target model, along with a belief augmenter to generate instructions for the target model to improve its robustness to those adversarial probes.
arXiv Detail & Related papers (2023-11-16T00:35:54Z) - Practical Membership Inference Attacks against Fine-tuned Large Language Models via Self-prompt Calibration [32.15773300068426]
Membership Inference Attacks (MIAs) aim to infer whether a target data record has been utilized for model training or not.
We propose a Membership Inference Attack based on Self-calibrated Probabilistic Variation (SPV-MIA)
Specifically, since memorization in LLMs is inevitable during the training process and occurs before overfitting, we introduce a more reliable membership signal.
arXiv Detail & Related papers (2023-11-10T13:55:05Z) - ML-Doctor: Holistic Risk Assessment of Inference Attacks Against Machine
Learning Models [64.03398193325572]
Inference attacks against Machine Learning (ML) models allow adversaries to learn about training data, model parameters, etc.
We concentrate on four attacks - namely, membership inference, model inversion, attribute inference, and model stealing.
Our analysis relies on a modular re-usable software, ML-Doctor, which enables ML model owners to assess the risks of deploying their models.
arXiv Detail & Related papers (2021-02-04T11:35:13Z) - Quantifying Membership Inference Vulnerability via Generalization Gap
and Other Model Metrics [4.416432468665362]
We show how a target model's generalization gap leads directly to an effective deterministic black box membership inference attack (MIA)
This attack is shown to be optimal in the expected sense given access to only certain likely obtainable metrics regarding the network's training and performance.
arXiv Detail & Related papers (2020-09-11T21:53:50Z) - Improving Robustness to Model Inversion Attacks via Mutual Information
Regularization [12.079281416410227]
This paper studies defense mechanisms against model inversion (MI) attacks.
MI is a type of privacy attacks aimed at inferring information about the training data distribution given the access to a target machine learning model.
We propose the Mutual Information Regularization based Defense (MID) against MI attacks.
arXiv Detail & Related papers (2020-09-11T06:02:44Z) - Adversarial Example Games [51.92698856933169]
Adrial Example Games (AEG) is a framework that models the crafting of adversarial examples.
AEG provides a new way to design adversarial examples by adversarially training a generator and aversa from a given hypothesis class.
We demonstrate the efficacy of AEG on the MNIST and CIFAR-10 datasets.
arXiv Detail & Related papers (2020-07-01T19:47:23Z) - Boosting Black-Box Attack with Partially Transferred Conditional
Adversarial Distribution [83.02632136860976]
We study black-box adversarial attacks against deep neural networks (DNNs)
We develop a novel mechanism of adversarial transferability, which is robust to the surrogate biases.
Experiments on benchmark datasets and attacking against real-world API demonstrate the superior attack performance of the proposed method.
arXiv Detail & Related papers (2020-06-15T16:45:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.