Qu-ANTI-zation: Exploiting Quantization Artifacts for Achieving
Adversarial Outcomes
- URL: http://arxiv.org/abs/2110.13541v1
- Date: Tue, 26 Oct 2021 10:09:49 GMT
- Title: Qu-ANTI-zation: Exploiting Quantization Artifacts for Achieving
Adversarial Outcomes
- Authors: Sanghyun Hong, Michael-Andrei Panaitescu-Liess, Yi\u{g}itcan Kaya,
Tudor Dumitra\c{s}
- Abstract summary: Quantization is a technique that transforms the parameter representation of a neural network from floating-point numbers into lower-precision ones.
We propose a new training framework to implement adversarial quantization outcomes.
We show that a single compromised model defeats multiple quantization schemes.
- Score: 5.865029600972316
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Quantization is a popular technique that $transforms$ the parameter
representation of a neural network from floating-point numbers into
lower-precision ones ($e.g.$, 8-bit integers). It reduces the memory footprint
and the computational cost at inference, facilitating the deployment of
resource-hungry models. However, the parameter perturbations caused by this
transformation result in $behavioral$ $disparities$ between the model before
and after quantization. For example, a quantized model can misclassify some
test-time samples that are otherwise classified correctly. It is not known
whether such differences lead to a new security vulnerability. We hypothesize
that an adversary may control this disparity to introduce specific behaviors
that activate upon quantization. To study this hypothesis, we weaponize
quantization-aware training and propose a new training framework to implement
adversarial quantization outcomes. Following this framework, we present three
attacks we carry out with quantization: (i) an indiscriminate attack for
significant accuracy loss; (ii) a targeted attack against specific samples; and
(iii) a backdoor attack for controlling the model with an input trigger. We
further show that a single compromised model defeats multiple quantization
schemes, including robust quantization techniques. Moreover, in a federated
learning scenario, we demonstrate that a set of malicious participants who
conspire can inject our quantization-activated backdoor. Lastly, we discuss
potential counter-measures and show that only re-training consistently removes
the attack artifacts. Our code is available at
https://github.com/Secure-AI-Systems-Group/Qu-ANTI-zation
Related papers
- Membership Inference Attacks on Diffusion Models via Quantile Regression [30.30033625685376]
We demonstrate a privacy vulnerability of diffusion models through amembership inference (MI) attack.
Our proposed MI attack learns quantile regression models that predict (a quantile of) the distribution of reconstruction loss on examples not used in training.
We show that our attack outperforms the prior state-of-the-art attack while being substantially less computationally expensive.
arXiv Detail & Related papers (2023-12-08T16:21:24Z) - Backdoor Learning on Sequence to Sequence Models [94.23904400441957]
In this paper, we study whether sequence-to-sequence (seq2seq) models are vulnerable to backdoor attacks.
Specifically, we find by only injecting 0.2% samples of the dataset, we can cause the seq2seq model to generate the designated keyword and even the whole sentence.
Extensive experiments on machine translation and text summarization have been conducted to show our proposed methods could achieve over 90% attack success rate on multiple datasets and models.
arXiv Detail & Related papers (2023-05-03T20:31:13Z) - CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive
Learning [63.72975421109622]
CleanCLIP is a finetuning framework that weakens the learned spurious associations introduced by backdoor attacks.
CleanCLIP maintains model performance on benign examples while erasing a range of backdoor attacks on multimodal contrastive learning.
arXiv Detail & Related papers (2023-03-06T17:48:32Z) - Versatile Weight Attack via Flipping Limited Bits [68.45224286690932]
We study a novel attack paradigm, which modifies model parameters in the deployment stage.
Considering the effectiveness and stealthiness goals, we provide a general formulation to perform the bit-flip based weight attack.
We present two cases of the general formulation with different malicious purposes, i.e., single sample attack (SSA) and triggered samples attack (TSA)
arXiv Detail & Related papers (2022-07-25T03:24:58Z) - Defending Variational Autoencoders from Adversarial Attacks with MCMC [74.36233246536459]
Variational autoencoders (VAEs) are deep generative models used in various domains.
As previous work has shown, one can easily fool VAEs to produce unexpected latent representations and reconstructions for a visually slightly modified input.
Here, we examine several objective functions for adversarial attacks construction, suggest metrics assess the model robustness, and propose a solution.
arXiv Detail & Related papers (2022-03-18T13:25:18Z) - Tolerating Adversarial Attacks and Byzantine Faults in Distributed
Machine Learning [12.464625883462515]
Adversarial attacks attempt to disrupt the training, retraining and utilizing of artificial intelligence and machine learning models.
We propose a novel distributed training algorithm, partial synchronous gradient descent (ParSGD), which defends adversarial attacks and/or tolerates Byzantine faults.
Our results show that using ParSGD, ML models can still produce accurate predictions as if it is not being attacked nor having failures at all when almost half of the nodes are being compromised or failed.
arXiv Detail & Related papers (2021-09-05T07:55:02Z) - Hidden Backdoor Attack against Semantic Segmentation Models [60.0327238844584]
The emphbackdoor attack intends to embed hidden backdoors in deep neural networks (DNNs) by poisoning training data.
We propose a novel attack paradigm, the emphfine-grained attack, where we treat the target label from the object-level instead of the image-level.
Experiments show that the proposed methods can successfully attack semantic segmentation models by poisoning only a small proportion of training data.
arXiv Detail & Related papers (2021-03-06T05:50:29Z) - Practical No-box Adversarial Attacks against DNNs [31.808770437120536]
We investigate no-box adversarial examples, where the attacker can neither access the model information or the training set nor query the model.
We propose three mechanisms for training with a very small dataset and find that prototypical reconstruction is the most effective.
Our approach significantly diminishes the average prediction accuracy of the system to only 15.40%, which is on par with the attack that transfers adversarial examples from a pre-trained Arcface model.
arXiv Detail & Related papers (2020-12-04T11:10:03Z) - Backdoor Smoothing: Demystifying Backdoor Attacks on Deep Neural
Networks [25.23881974235643]
We show that backdoor attacks induce a smoother decision function around the triggered samples -- a phenomenon which we refer to as textitbackdoor smoothing.
Our experiments show that smoothness increases when the trigger is added to the input samples, and that this phenomenon is more pronounced for more successful attacks.
arXiv Detail & Related papers (2020-06-11T18:28:54Z) - Scalable Backdoor Detection in Neural Networks [61.39635364047679]
Deep learning models are vulnerable to Trojan attacks, where an attacker can install a backdoor during training time to make the resultant model misidentify samples contaminated with a small trigger patch.
We propose a novel trigger reverse-engineering based approach whose computational complexity does not scale with the number of labels, and is based on a measure that is both interpretable and universal across different network and patch types.
In experiments, we observe that our method achieves a perfect score in separating Trojaned models from pure models, which is an improvement over the current state-of-the art method.
arXiv Detail & Related papers (2020-06-10T04:12:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.