GLiRA: Black-Box Membership Inference Attack via Knowledge Distillation
- URL: http://arxiv.org/abs/2405.07562v1
- Date: Mon, 13 May 2024 08:52:04 GMT
- Title: GLiRA: Black-Box Membership Inference Attack via Knowledge Distillation
- Authors: Andrey V. Galichin, Mikhail Pautov, Alexey Zhavoronkin, Oleg Y. Rogov, Ivan Oseledets,
- Abstract summary: We explore a connection between the susceptibility to membership inference attacks and the vulnerability to distillation-based functionality stealing attacks.
We propose GLiRA, a distillation-guided approach to membership inference attack on the black-box neural network.
We evaluate the proposed method across multiple image classification datasets and models and demonstrate that likelihood ratio attacks when guided by the knowledge distillation, outperform the current state-of-the-art membership inference attacks in the black-box setting.
- Score: 4.332441337407564
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While Deep Neural Networks (DNNs) have demonstrated remarkable performance in tasks related to perception and control, there are still several unresolved concerns regarding the privacy of their training data, particularly in the context of vulnerability to Membership Inference Attacks (MIAs). In this paper, we explore a connection between the susceptibility to membership inference attacks and the vulnerability to distillation-based functionality stealing attacks. In particular, we propose {GLiRA}, a distillation-guided approach to membership inference attack on the black-box neural network. We observe that the knowledge distillation significantly improves the efficiency of likelihood ratio of membership inference attack, especially in the black-box setting, i.e., when the architecture of the target model is unknown to the attacker. We evaluate the proposed method across multiple image classification datasets and models and demonstrate that likelihood ratio attacks when guided by the knowledge distillation, outperform the current state-of-the-art membership inference attacks in the black-box setting.
Related papers
- Unveiling the Unseen: Exploring Whitebox Membership Inference through the Lens of Explainability [10.632831321114502]
We propose an attack-driven explainable framework to identify the most influential features of raw data that lead to successful membership inference attacks.
Our proposed MIA shows an improvement of up to 26% on state-of-the-art MIA.
arXiv Detail & Related papers (2024-07-01T14:07:46Z) - Evaluating Membership Inference Through Adversarial Robustness [6.983991370116041]
We propose an enhanced methodology for membership inference attacks based on adversarial robustness.
We evaluate our proposed method on three datasets: Fashion-MNIST, CIFAR-10, and CIFAR-100.
arXiv Detail & Related papers (2022-05-14T06:48:47Z) - Saliency Diversified Deep Ensemble for Robustness to Adversaries [1.9659095632676094]
This work proposes a novel diversity-promoting learning approach for the deep ensembles.
The idea is to promote saliency map diversity (SMD) on ensemble members to prevent the attacker from targeting all ensemble members at once.
We empirically show a reduced transferability between ensemble members and improved performance compared to the state-of-the-art ensemble defense.
arXiv Detail & Related papers (2021-12-07T10:18:43Z) - Formalizing and Estimating Distribution Inference Risks [11.650381752104298]
We propose a formal and general definition of property inference attacks.
Our results show that inexpensive attacks are as effective as expensive meta-classifier attacks.
We extend the state-of-the-art property inference attack to work on convolutional neural networks.
arXiv Detail & Related papers (2021-09-13T14:54:39Z) - Where Did You Learn That From? Surprising Effectiveness of Membership
Inference Attacks Against Temporally Correlated Data in Deep Reinforcement
Learning [114.9857000195174]
A major challenge to widespread industrial adoption of deep reinforcement learning is the potential vulnerability to privacy breaches.
We propose an adversarial attack framework tailored for testing the vulnerability of deep reinforcement learning algorithms to membership inference attacks.
arXiv Detail & Related papers (2021-09-08T23:44:57Z) - Delving into Data: Effectively Substitute Training for Black-box Attack [84.85798059317963]
We propose a novel perspective substitute training that focuses on designing the distribution of data used in the knowledge stealing process.
The combination of these two modules can further boost the consistency of the substitute model and target model, which greatly improves the effectiveness of adversarial attack.
arXiv Detail & Related papers (2021-04-26T07:26:29Z) - Combating Adversaries with Anti-Adversaries [118.70141983415445]
In particular, our layer generates an input perturbation in the opposite direction of the adversarial one.
We verify the effectiveness of our approach by combining our layer with both nominally and robustly trained models.
Our anti-adversary layer significantly enhances model robustness while coming at no cost on clean accuracy.
arXiv Detail & Related papers (2021-03-26T09:36:59Z) - Improving Query Efficiency of Black-box Adversarial Attack [75.71530208862319]
We propose a Neural Process based black-box adversarial attack (NP-Attack)
NP-Attack could greatly decrease the query counts under the black-box setting.
arXiv Detail & Related papers (2020-09-24T06:22:56Z) - An Extension of Fano's Inequality for Characterizing Model
Susceptibility to Membership Inference Attacks [28.366183028100057]
We show that the probability of success for a membership inference attack on a deep neural network can be bounded using the mutual information between its inputs and its activations.
This enables the use of mutual information to measure the susceptibility of a DNN model to membership inference attacks.
arXiv Detail & Related papers (2020-09-17T06:37:15Z) - Sampling Attacks: Amplification of Membership Inference Attacks by
Repeated Queries [74.59376038272661]
We introduce sampling attack, a novel membership inference technique that unlike other standard membership adversaries is able to work under severe restriction of no access to scores of the victim model.
We show that a victim model that only publishes the labels is still susceptible to sampling attacks and the adversary can recover up to 100% of its performance.
For defense, we choose differential privacy in the form of gradient perturbation during the training of the victim model as well as output perturbation at prediction time.
arXiv Detail & Related papers (2020-09-01T12:54:54Z) - Defense for Black-box Attacks on Anti-spoofing Models by Self-Supervised
Learning [71.17774313301753]
We explore the robustness of self-supervised learned high-level representations by using them in the defense against adversarial attacks.
Experimental results on the ASVspoof 2019 dataset demonstrate that high-level representations extracted by Mockingjay can prevent the transferability of adversarial examples.
arXiv Detail & Related papers (2020-06-05T03:03:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.