GLiRA: Black-Box Membership Inference Attack via Knowledge Distillation
- URL: http://arxiv.org/abs/2405.07562v1
- Date: Mon, 13 May 2024 08:52:04 GMT
- Title: GLiRA: Black-Box Membership Inference Attack via Knowledge Distillation
- Authors: Andrey V. Galichin, Mikhail Pautov, Alexey Zhavoronkin, Oleg Y. Rogov, Ivan Oseledets,
- Abstract summary: We explore a connection between the susceptibility to membership inference attacks and the vulnerability to distillation-based functionality stealing attacks.
We propose GLiRA, a distillation-guided approach to membership inference attack on the black-box neural network.
We evaluate the proposed method across multiple image classification datasets and models and demonstrate that likelihood ratio attacks when guided by the knowledge distillation, outperform the current state-of-the-art membership inference attacks in the black-box setting.
- Score: 4.332441337407564
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While Deep Neural Networks (DNNs) have demonstrated remarkable performance in tasks related to perception and control, there are still several unresolved concerns regarding the privacy of their training data, particularly in the context of vulnerability to Membership Inference Attacks (MIAs). In this paper, we explore a connection between the susceptibility to membership inference attacks and the vulnerability to distillation-based functionality stealing attacks. In particular, we propose {GLiRA}, a distillation-guided approach to membership inference attack on the black-box neural network. We observe that the knowledge distillation significantly improves the efficiency of likelihood ratio of membership inference attack, especially in the black-box setting, i.e., when the architecture of the target model is unknown to the attacker. We evaluate the proposed method across multiple image classification datasets and models and demonstrate that likelihood ratio attacks when guided by the knowledge distillation, outperform the current state-of-the-art membership inference attacks in the black-box setting.
Related papers
- Edge-Only Universal Adversarial Attacks in Distributed Learning [49.546479320670464]
In this work, we explore the feasibility of generating universal adversarial attacks when an attacker has access to the edge part of the model only.
Our approach shows that adversaries can induce effective mispredictions in the unknown cloud part by leveraging key features on the edge side.
Our results on ImageNet demonstrate strong attack transferability to the unknown cloud part.
arXiv Detail & Related papers (2024-11-15T11:06:24Z) - Model Mimic Attack: Knowledge Distillation for Provably Transferable Adversarial Examples [1.1820990818670631]
This work is the first to provide provable guarantees on the success of knowledge distillation-based attack on classification neural networks.
We prove that if the student model has enough learning capabilities, the attack on the teacher model is guaranteed to be found within the finite number of distillation iterations.
arXiv Detail & Related papers (2024-10-21T11:06:56Z) - Unveiling the Unseen: Exploring Whitebox Membership Inference through the Lens of Explainability [10.632831321114502]
We propose an attack-driven explainable framework to identify the most influential features of raw data that lead to successful membership inference attacks.
Our proposed MIA shows an improvement of up to 26% on state-of-the-art MIA.
arXiv Detail & Related papers (2024-07-01T14:07:46Z) - Evaluating Membership Inference Through Adversarial Robustness [6.983991370116041]
We propose an enhanced methodology for membership inference attacks based on adversarial robustness.
We evaluate our proposed method on three datasets: Fashion-MNIST, CIFAR-10, and CIFAR-100.
arXiv Detail & Related papers (2022-05-14T06:48:47Z) - Formalizing and Estimating Distribution Inference Risks [11.650381752104298]
We propose a formal and general definition of property inference attacks.
Our results show that inexpensive attacks are as effective as expensive meta-classifier attacks.
We extend the state-of-the-art property inference attack to work on convolutional neural networks.
arXiv Detail & Related papers (2021-09-13T14:54:39Z) - Where Did You Learn That From? Surprising Effectiveness of Membership
Inference Attacks Against Temporally Correlated Data in Deep Reinforcement
Learning [114.9857000195174]
A major challenge to widespread industrial adoption of deep reinforcement learning is the potential vulnerability to privacy breaches.
We propose an adversarial attack framework tailored for testing the vulnerability of deep reinforcement learning algorithms to membership inference attacks.
arXiv Detail & Related papers (2021-09-08T23:44:57Z) - Delving into Data: Effectively Substitute Training for Black-box Attack [84.85798059317963]
We propose a novel perspective substitute training that focuses on designing the distribution of data used in the knowledge stealing process.
The combination of these two modules can further boost the consistency of the substitute model and target model, which greatly improves the effectiveness of adversarial attack.
arXiv Detail & Related papers (2021-04-26T07:26:29Z) - Combating Adversaries with Anti-Adversaries [118.70141983415445]
In particular, our layer generates an input perturbation in the opposite direction of the adversarial one.
We verify the effectiveness of our approach by combining our layer with both nominally and robustly trained models.
Our anti-adversary layer significantly enhances model robustness while coming at no cost on clean accuracy.
arXiv Detail & Related papers (2021-03-26T09:36:59Z) - Improving Query Efficiency of Black-box Adversarial Attack [75.71530208862319]
We propose a Neural Process based black-box adversarial attack (NP-Attack)
NP-Attack could greatly decrease the query counts under the black-box setting.
arXiv Detail & Related papers (2020-09-24T06:22:56Z) - An Extension of Fano's Inequality for Characterizing Model
Susceptibility to Membership Inference Attacks [28.366183028100057]
We show that the probability of success for a membership inference attack on a deep neural network can be bounded using the mutual information between its inputs and its activations.
This enables the use of mutual information to measure the susceptibility of a DNN model to membership inference attacks.
arXiv Detail & Related papers (2020-09-17T06:37:15Z) - Sampling Attacks: Amplification of Membership Inference Attacks by
Repeated Queries [74.59376038272661]
We introduce sampling attack, a novel membership inference technique that unlike other standard membership adversaries is able to work under severe restriction of no access to scores of the victim model.
We show that a victim model that only publishes the labels is still susceptible to sampling attacks and the adversary can recover up to 100% of its performance.
For defense, we choose differential privacy in the form of gradient perturbation during the training of the victim model as well as output perturbation at prediction time.
arXiv Detail & Related papers (2020-09-01T12:54:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.