Related papers: Qu-ANTI-zation: Exploiting Quantization Artifacts for Achieving Adversarial Outcomes

Qu-ANTI-zation: Exploiting Quantization Artifacts for Achieving Adversarial Outcomes

URL: http://arxiv.org/abs/2110.13541v1
Date: Tue, 26 Oct 2021 10:09:49 GMT
Title: Qu-ANTI-zation: Exploiting Quantization Artifacts for Achieving Adversarial Outcomes
Authors: Sanghyun Hong, Michael-Andrei Panaitescu-Liess, Yi\u{g}itcan Kaya, Tudor Dumitra\c{s}
Abstract summary: Quantization is a technique that transforms the parameter representation of a neural network from floating-point numbers into lower-precision ones. We propose a new training framework to implement adversarial quantization outcomes. We show that a single compromised model defeats multiple quantization schemes.
Score: 5.865029600972316
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Quantization is a popular technique that $transforms$ the parameter representation of a neural network from floating-point numbers into lower-precision ones ($e.g.$, 8-bit integers). It reduces the memory footprint and the computational cost at inference, facilitating the deployment of resource-hungry models. However, the parameter perturbations caused by this transformation result in $behavioral$ $disparities$ between the model before and after quantization. For example, a quantized model can misclassify some test-time samples that are otherwise classified correctly. It is not known whether such differences lead to a new security vulnerability. We hypothesize that an adversary may control this disparity to introduce specific behaviors that activate upon quantization. To study this hypothesis, we weaponize quantization-aware training and propose a new training framework to implement adversarial quantization outcomes. Following this framework, we present three attacks we carry out with quantization: (i) an indiscriminate attack for significant accuracy loss; (ii) a targeted attack against specific samples; and (iii) a backdoor attack for controlling the model with an input trigger. We further show that a single compromised model defeats multiple quantization schemes, including robust quantization techniques. Moreover, in a federated learning scenario, we demonstrate that a set of malicious participants who conspire can inject our quantization-activated backdoor. Lastly, we discuss potential counter-measures and show that only re-training consistently removes the attack artifacts. Our code is available at https://github.com/Secure-AI-Systems-Group/Qu-ANTI-zation

Related papers

Mind the Gap: A Practical Attack on GGUF Quantization [6.506984021742173]
We introduce the first attack on the GGUF family of post-training quantization methods.<n>We develop an attack that trains the target malicious LLM while constraining its weights based on quantization errors.<n>Our attack highlights that the most widely used post-training quantization method is susceptible to adversarial interferences.
arXiv Detail & Related papers (2025-05-24T16:30:37Z)
Towards Model Resistant to Transferable Adversarial Examples via Trigger Activation [95.3977252782181]
Adversarial examples, characterized by imperceptible perturbations, pose significant threats to deep neural networks by misleading their predictions. We introduce a novel training paradigm aimed at enhancing robustness against transferable adversarial examples (TAEs) in a more efficient and effective way.
arXiv Detail & Related papers (2025-04-20T09:07:10Z)
PBP: Post-training Backdoor Purification for Malware Classifiers [5.112004957241861]
In recent years, the rise of machine learning (ML) in cybersecurity has brought new challenges, including the increasing threat of backdoor poisoning attacks. Here, we introduce PBP, a post-training defense for malware classifiers that mitigates various types of backdoor embeddings without assuming any specific backdoor embedding mechanism. Our method demonstrates substantial advantages over several state-of-the-art methods, as evidenced by experiments on two datasets, two types of backdoor methods, and various attack configurations.
arXiv Detail & Related papers (2024-12-04T16:30:03Z)
Membership Inference Attacks on Diffusion Models via Quantile Regression [30.30033625685376]
We demonstrate a privacy vulnerability of diffusion models through amembership inference (MI) attack. Our proposed MI attack learns quantile regression models that predict (a quantile of) the distribution of reconstruction loss on examples not used in training. We show that our attack outperforms the prior state-of-the-art attack while being substantially less computationally expensive.
arXiv Detail & Related papers (2023-12-08T16:21:24Z)
Backdoor Learning on Sequence to Sequence Models [94.23904400441957]
In this paper, we study whether sequence-to-sequence (seq2seq) models are vulnerable to backdoor attacks. Specifically, we find by only injecting 0.2% samples of the dataset, we can cause the seq2seq model to generate the designated keyword and even the whole sentence. Extensive experiments on machine translation and text summarization have been conducted to show our proposed methods could achieve over 90% attack success rate on multiple datasets and models.
arXiv Detail & Related papers (2023-05-03T20:31:13Z)
CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning [63.72975421109622]
CleanCLIP is a finetuning framework that weakens the learned spurious associations introduced by backdoor attacks. CleanCLIP maintains model performance on benign examples while erasing a range of backdoor attacks on multimodal contrastive learning.
arXiv Detail & Related papers (2023-03-06T17:48:32Z)
Versatile Weight Attack via Flipping Limited Bits [68.45224286690932]
We study a novel attack paradigm, which modifies model parameters in the deployment stage. Considering the effectiveness and stealthiness goals, we provide a general formulation to perform the bit-flip based weight attack. We present two cases of the general formulation with different malicious purposes, i.e., single sample attack (SSA) and triggered samples attack (TSA)
arXiv Detail & Related papers (2022-07-25T03:24:58Z)
Defending Variational Autoencoders from Adversarial Attacks with MCMC [74.36233246536459]
Variational autoencoders (VAEs) are deep generative models used in various domains. As previous work has shown, one can easily fool VAEs to produce unexpected latent representations and reconstructions for a visually slightly modified input. Here, we examine several objective functions for adversarial attacks construction, suggest metrics assess the model robustness, and propose a solution.
arXiv Detail & Related papers (2022-03-18T13:25:18Z)
Tolerating Adversarial Attacks and Byzantine Faults in Distributed Machine Learning [12.464625883462515]
Adversarial attacks attempt to disrupt the training, retraining and utilizing of artificial intelligence and machine learning models. We propose a novel distributed training algorithm, partial synchronous gradient descent (ParSGD), which defends adversarial attacks and/or tolerates Byzantine faults. Our results show that using ParSGD, ML models can still produce accurate predictions as if it is not being attacked nor having failures at all when almost half of the nodes are being compromised or failed.
arXiv Detail & Related papers (2021-09-05T07:55:02Z)
Hidden Backdoor Attack against Semantic Segmentation Models [60.0327238844584]
The emphbackdoor attack intends to embed hidden backdoors in deep neural networks (DNNs) by poisoning training data. We propose a novel attack paradigm, the emphfine-grained attack, where we treat the target label from the object-level instead of the image-level. Experiments show that the proposed methods can successfully attack semantic segmentation models by poisoning only a small proportion of training data.
arXiv Detail & Related papers (2021-03-06T05:50:29Z)
Practical No-box Adversarial Attacks against DNNs [31.808770437120536]
We investigate no-box adversarial examples, where the attacker can neither access the model information or the training set nor query the model. We propose three mechanisms for training with a very small dataset and find that prototypical reconstruction is the most effective. Our approach significantly diminishes the average prediction accuracy of the system to only 15.40%, which is on par with the attack that transfers adversarial examples from a pre-trained Arcface model.
arXiv Detail & Related papers (2020-12-04T11:10:03Z)
Backdoor Smoothing: Demystifying Backdoor Attacks on Deep Neural Networks [25.23881974235643]
We show that backdoor attacks induce a smoother decision function around the triggered samples -- a phenomenon which we refer to as textitbackdoor smoothing. Our experiments show that smoothness increases when the trigger is added to the input samples, and that this phenomenon is more pronounced for more successful attacks.
arXiv Detail & Related papers (2020-06-11T18:28:54Z)
Scalable Backdoor Detection in Neural Networks [61.39635364047679]
Deep learning models are vulnerable to Trojan attacks, where an attacker can install a backdoor during training time to make the resultant model misidentify samples contaminated with a small trigger patch. We propose a novel trigger reverse-engineering based approach whose computational complexity does not scale with the number of labels, and is based on a measure that is both interpretable and universal across different network and patch types. In experiments, we observe that our method achieves a perfect score in separating Trojaned models from pure models, which is an improvement over the current state-of-the art method.
arXiv Detail & Related papers (2020-06-10T04:12:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.