A Plot is Worth a Thousand Words: Model Information Stealing Attacks via
Scientific Plots
- URL: http://arxiv.org/abs/2302.11982v1
- Date: Thu, 23 Feb 2023 12:57:34 GMT
- Title: A Plot is Worth a Thousand Words: Model Information Stealing Attacks via
Scientific Plots
- Authors: Boyang Zhang, Xinlei He, Yun Shen, Tianhao Wang, Yang Zhang
- Abstract summary: It is well known that an adversary can leverage a target ML model's output to steal the model's information.
We propose a new side channel for model information stealing attacks, i.e., models' scientific plots.
- Score: 14.998272283348152
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Building advanced machine learning (ML) models requires expert knowledge and
many trials to discover the best architecture and hyperparameter settings.
Previous work demonstrates that model information can be leveraged to assist
other attacks, such as membership inference, generating adversarial examples.
Therefore, such information, e.g., hyperparameters, should be kept
confidential. It is well known that an adversary can leverage a target ML
model's output to steal the model's information. In this paper, we discover a
new side channel for model information stealing attacks, i.e., models'
scientific plots which are extensively used to demonstrate model performance
and are easily accessible. Our attack is simple and straightforward. We
leverage the shadow model training techniques to generate training data for the
attack model which is essentially an image classifier. Extensive evaluation on
three benchmark datasets shows that our proposed attack can effectively infer
the architecture/hyperparameters of image classifiers based on convolutional
neural network (CNN) given the scientific plot generated from it. We also
reveal that the attack's success is mainly caused by the shape of the
scientific plots, and further demonstrate that the attacks are robust in
various scenarios. Given the simplicity and effectiveness of the attack method,
our study indicates scientific plots indeed constitute a valid side channel for
model information stealing attacks. To mitigate the attacks, we propose several
defense mechanisms that can reduce the original attacks' accuracy while
maintaining the plot utility. However, such defenses can still be bypassed by
adaptive attacks.
Related papers
- Isolation and Induction: Training Robust Deep Neural Networks against
Model Stealing Attacks [51.51023951695014]
Existing model stealing defenses add deceptive perturbations to the victim's posterior probabilities to mislead the attackers.
This paper proposes Isolation and Induction (InI), a novel and effective training framework for model stealing defenses.
In contrast to adding perturbations over model predictions that harm the benign accuracy, we train models to produce uninformative outputs against stealing queries.
arXiv Detail & Related papers (2023-08-02T05:54:01Z) - Scalable Membership Inference Attacks via Quantile Regression [35.33158339354343]
Membership inference attacks are designed to determine, using black box access to trained models, whether a particular example was used in training or not.
We introduce a new class of attacks based on performing quantile regression on the distribution of confidence scores induced by the model under attack on points that are not used in training.
arXiv Detail & Related papers (2023-07-07T16:07:00Z) - Can Adversarial Examples Be Parsed to Reveal Victim Model Information? [62.814751479749695]
In this work, we ask whether it is possible to infer data-agnostic victim model (VM) information from data-specific adversarial instances.
We collect a dataset of adversarial attacks across 7 attack types generated from 135 victim models.
We show that a simple, supervised model parsing network (MPN) is able to infer VM attributes from unseen adversarial attacks.
arXiv Detail & Related papers (2023-03-13T21:21:49Z) - Reconstructing Training Data with Informed Adversaries [30.138217209991826]
Given access to a machine learning model, can an adversary reconstruct the model's training data?
This work studies this question from the lens of a powerful informed adversary who knows all the training data points except one.
We show it is feasible to reconstruct the remaining data point in this stringent threat model.
arXiv Detail & Related papers (2022-01-13T09:19:25Z) - Delving into Data: Effectively Substitute Training for Black-box Attack [84.85798059317963]
We propose a novel perspective substitute training that focuses on designing the distribution of data used in the knowledge stealing process.
The combination of these two modules can further boost the consistency of the substitute model and target model, which greatly improves the effectiveness of adversarial attack.
arXiv Detail & Related papers (2021-04-26T07:26:29Z) - Manipulating SGD with Data Ordering Attacks [23.639512087220137]
We present a class of training-time attacks that require no changes to the underlying model dataset or architecture.
In particular, an attacker can disrupt the integrity and availability of a model by simply reordering training batches.
Attacks have a long-term impact in that they decrease model performance hundreds of epochs after the attack took place.
arXiv Detail & Related papers (2021-04-19T22:17:27Z) - Practical No-box Adversarial Attacks against DNNs [31.808770437120536]
We investigate no-box adversarial examples, where the attacker can neither access the model information or the training set nor query the model.
We propose three mechanisms for training with a very small dataset and find that prototypical reconstruction is the most effective.
Our approach significantly diminishes the average prediction accuracy of the system to only 15.40%, which is on par with the attack that transfers adversarial examples from a pre-trained Arcface model.
arXiv Detail & Related papers (2020-12-04T11:10:03Z) - Knowledge-Enriched Distributional Model Inversion Attacks [49.43828150561947]
Model inversion (MI) attacks are aimed at reconstructing training data from model parameters.
We present a novel inversion-specific GAN that can better distill knowledge useful for performing attacks on private models from public data.
Our experiments show that the combination of these techniques can significantly boost the success rate of the state-of-the-art MI attacks by 150%.
arXiv Detail & Related papers (2020-10-08T16:20:48Z) - Learning to Attack: Towards Textual Adversarial Attacking in Real-world
Situations [81.82518920087175]
Adversarial attacking aims to fool deep neural networks with adversarial examples.
We propose a reinforcement learning based attack model, which can learn from attack history and launch attacks more efficiently.
arXiv Detail & Related papers (2020-09-19T09:12:24Z) - Adversarial Imitation Attack [63.76805962712481]
A practical adversarial attack should require as little as possible knowledge of attacked models.
Current substitute attacks need pre-trained models to generate adversarial examples.
In this study, we propose a novel adversarial imitation attack.
arXiv Detail & Related papers (2020-03-28T10:02:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.