Publishing Efficient On-device Models Increases Adversarial
Vulnerability
- URL: http://arxiv.org/abs/2212.13700v1
- Date: Wed, 28 Dec 2022 05:05:58 GMT
- Title: Publishing Efficient On-device Models Increases Adversarial
Vulnerability
- Authors: Sanghyun Hong, Nicholas Carlini, Alexey Kurakin
- Abstract summary: In this paper, we study the security considerations of publishing on-device variants of large-scale models.
We first show that an adversary can exploit on-device models to make attacking the large models easier.
We then show that the vulnerability increases as the similarity between a full-scale and its efficient model increase.
- Score: 58.6975494957865
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent increases in the computational demands of deep neural networks (DNNs)
have sparked interest in efficient deep learning mechanisms, e.g., quantization
or pruning. These mechanisms enable the construction of a small, efficient
version of commercial-scale models with comparable accuracy, accelerating their
deployment to resource-constrained devices.
In this paper, we study the security considerations of publishing on-device
variants of large-scale models. We first show that an adversary can exploit
on-device models to make attacking the large models easier. In evaluations
across 19 DNNs, by exploiting the published on-device models as a transfer
prior, the adversarial vulnerability of the original commercial-scale models
increases by up to 100x. We then show that the vulnerability increases as the
similarity between a full-scale and its efficient model increase. Based on the
insights, we propose a defense, $similarity$-$unpairing$, that fine-tunes
on-device models with the objective of reducing the similarity. We evaluated
our defense on all the 19 DNNs and found that it reduces the transferability up
to 90% and the number of queries required by a factor of 10-100x. Our results
suggest that further research is needed on the security (or even privacy)
threats caused by publishing those efficient siblings.
Related papers
- Disarming Steganography Attacks Inside Neural Network Models [4.750077838548593]
We propose a zero-trust prevention strategy based on AI model attack disarm and reconstruction.
We demonstrate a 100% prevention rate while the methods introduce a minimal decrease in model accuracy based on Qint8 and K-LRBP methods.
arXiv Detail & Related papers (2023-09-06T15:18:35Z) - Isolation and Induction: Training Robust Deep Neural Networks against
Model Stealing Attacks [51.51023951695014]
Existing model stealing defenses add deceptive perturbations to the victim's posterior probabilities to mislead the attackers.
This paper proposes Isolation and Induction (InI), a novel and effective training framework for model stealing defenses.
In contrast to adding perturbations over model predictions that harm the benign accuracy, we train models to produce uninformative outputs against stealing queries.
arXiv Detail & Related papers (2023-08-02T05:54:01Z) - Partially Oblivious Neural Network Inference [4.843820624525483]
We show that for neural network models, like CNNs, some information leakage can be acceptable.
We experimentally demonstrate that in a CIFAR-10 network we can leak up to $80%$ of the model's weights with practically no security impact.
arXiv Detail & Related papers (2022-10-27T05:39:36Z) - RIBAC: Towards Robust and Imperceptible Backdoor Attack against Compact
DNN [28.94653593443991]
Recently backdoor attack has become an emerging threat to the security of deep neural network (DNN) models.
In this paper, we propose to study and develop Robust and Imperceptible Backdoor Attack against Compact DNN models (RIBAC)
arXiv Detail & Related papers (2022-08-22T21:27:09Z) - Adversarial Robustness Assessment of NeuroEvolution Approaches [1.237556184089774]
We evaluate the robustness of models found by two NeuroEvolution approaches on the CIFAR-10 image classification task.
Our results show that when the evolved models are attacked with iterative methods, their accuracy usually drops to, or close to, zero.
Some of these techniques can exacerbate the perturbations added to the original inputs, potentially harming robustness.
arXiv Detail & Related papers (2022-07-12T10:40:19Z) - Interpolated Joint Space Adversarial Training for Robust and
Generalizable Defenses [82.3052187788609]
Adversarial training (AT) is considered to be one of the most reliable defenses against adversarial attacks.
Recent works show generalization improvement with adversarial samples under novel threat models.
We propose a novel threat model called Joint Space Threat Model (JSTM)
Under JSTM, we develop novel adversarial attacks and defenses.
arXiv Detail & Related papers (2021-12-12T21:08:14Z) - Federated Learning with Unreliable Clients: Performance Analysis and
Mechanism Design [76.29738151117583]
Federated Learning (FL) has become a promising tool for training effective machine learning models among distributed clients.
However, low quality models could be uploaded to the aggregator server by unreliable clients, leading to a degradation or even a collapse of training.
We model these unreliable behaviors of clients and propose a defensive mechanism to mitigate such a security risk.
arXiv Detail & Related papers (2021-05-10T08:02:27Z) - "What's in the box?!": Deflecting Adversarial Attacks by Randomly
Deploying Adversarially-Disjoint Models [71.91835408379602]
adversarial examples have been long considered a real threat to machine learning models.
We propose an alternative deployment-based defense paradigm that goes beyond the traditional white-box and black-box threat models.
arXiv Detail & Related papers (2021-02-09T20:07:13Z) - Defence against adversarial attacks using classical and quantum-enhanced
Boltzmann machines [64.62510681492994]
generative models attempt to learn the distribution underlying a dataset, making them inherently more robust to small perturbations.
We find improvements ranging from 5% to 72% against attacks with Boltzmann machines on the MNIST dataset.
arXiv Detail & Related papers (2020-12-21T19:00:03Z) - EMPIR: Ensembles of Mixed Precision Deep Networks for Increased
Robustness against Adversarial Attacks [18.241639570479563]
Deep Neural Networks (DNNs) are vulnerable to adversarial attacks in which small input perturbations can produce catastrophic misclassifications.
We propose EMPIR, ensembles of quantized DNN models with different numerical precisions, as a new approach to increase robustness against adversarial attacks.
Our results indicate that EMPIR boosts the average adversarial accuracies by 42.6%, 15.2% and 10.5% for the DNN models trained on the MNIST, CIFAR-10 and ImageNet datasets respectively.
arXiv Detail & Related papers (2020-04-21T17:17:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.