Related papers: QuantAttack: Exploiting Dynamic Quantization to Attack Vision Transformers

QuantAttack: Exploiting Dynamic Quantization to Attack Vision Transformers

URL: http://arxiv.org/abs/2312.02220v1
Date: Sun, 3 Dec 2023 18:31:19 GMT
Title: QuantAttack: Exploiting Dynamic Quantization to Attack Vision Transformers
Authors: Amit Baras, Alon Zolfi, Yuval Elovici, Asaf Shabtai
Abstract summary: We present QuantAttack, a novel attack that targets the availability of quantized models. We show that carefully crafted adversarial examples, which are designed to exhaust the resources of the operating system, can trigger worst-case performance.
Score: 29.957089564635083
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In recent years, there has been a significant trend in deep neural networks (DNNs), particularly transformer-based models, of developing ever-larger and more capable models. While they demonstrate state-of-the-art performance, their growing scale requires increased computational resources (e.g., GPUs with greater memory capacity). To address this problem, quantization techniques (i.e., low-bit-precision representation and matrix multiplication) have been proposed. Most quantization techniques employ a static strategy in which the model parameters are quantized, either during training or inference, without considering the test-time sample. In contrast, dynamic quantization techniques, which have become increasingly popular, adapt during inference based on the input provided, while maintaining full-precision performance. However, their dynamic behavior and average-case performance assumption makes them vulnerable to a novel threat vector -- adversarial attacks that target the model's efficiency and availability. In this paper, we present QuantAttack, a novel attack that targets the availability of quantized models, slowing down the inference, and increasing memory usage and energy consumption. We show that carefully crafted adversarial examples, which are designed to exhaust the resources of the operating system, can trigger worst-case performance. In our experiments, we demonstrate the effectiveness of our attack on vision transformers on a wide range of tasks, both uni-modal and multi-modal. We also examine the effect of different attack variants (e.g., a universal perturbation) and the transferability between different models.

Related papers

Towards Model Resistant to Transferable Adversarial Examples via Trigger Activation [95.3977252782181]
Adversarial examples, characterized by imperceptible perturbations, pose significant threats to deep neural networks by misleading their predictions. We introduce a novel training paradigm aimed at enhancing robustness against transferable adversarial examples (TAEs) in a more efficient and effective way.
arXiv Detail & Related papers (2025-04-20T09:07:10Z)
Towards Scalable and Deep Graph Neural Networks via Noise Masking [59.058558158296265]
Graph Neural Networks (GNNs) have achieved remarkable success in many graph mining tasks. scaling them to large graphs is challenging due to the high computational and storage costs. We present random walk with noise masking (RMask), a plug-and-play module compatible with the existing model-simplification works.
arXiv Detail & Related papers (2024-12-19T07:48:14Z)
The Impact of Quantization on the Robustness of Transformer-based Text Classifiers [5.281054432963503]
This work is the first application of quantization on the robustness of NLP models. We evaluate the impact of quantization on BERT and DistilBERT models in text classification using SST-2, Emotion, and MR datasets. Our experiments indicate that quantization increases the robustness of the model by 18.80% on average compared to adversarial training.
arXiv Detail & Related papers (2024-03-08T14:55:05Z)
Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning. Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation. Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z)
Improving the Robustness of Transformer-based Large Language Models with Dynamic Attention [43.95101492654236]
Transformer-based models, such as BERT and GPT, have been widely adopted in natural language processing (NLP) Recent studies show their vulnerability to textual adversarial attacks where the model's output can be misled by intentionally manipulating the text inputs. We propose a novel method called dynamic attention, tailored for the transformer architecture, to enhance the inherent robustness of the model itself against various adversarial attacks.
arXiv Detail & Related papers (2023-11-29T07:09:13Z)
Uncovering the Hidden Cost of Model Compression [43.62624133952414]
Visual Prompting has emerged as a pivotal method for transfer learning in computer vision. Model compression detrimentally impacts the performance of visual prompting-based transfer. However, negative effects on calibration are not present when models are compressed via quantization.
arXiv Detail & Related papers (2023-08-29T01:47:49Z)
Common Knowledge Learning for Generating Transferable Adversarial Examples [60.1287733223249]
This paper focuses on an important type of black-box attacks, where the adversary generates adversarial examples by a substitute (source) model. Existing methods tend to give unsatisfactory adversarial transferability when the source and target models are from different types of DNN architectures. We propose a common knowledge learning (CKL) framework to learn better network weights to generate adversarial examples.
arXiv Detail & Related papers (2023-07-01T09:07:12Z)
Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing [18.673619610942197]
Modern transformer models tend to learn strong outliers in their activations, making them difficult to quantize. We show that strong outliers are related to very specific behavior of attention heads that try to learn a "no-op" or just a partial update of the residual. We propose two simple (independent) modifications to the attention mechanism - clipped softmax and gated attention.
arXiv Detail & Related papers (2023-06-22T14:39:04Z)
Temporal Dynamic Quantization for Diffusion Models [18.184163233551292]
We introduce a novel quantization method that dynamically adjusts the quantization interval based on time step information. Unlike conventional dynamic quantization techniques, our approach has no computational overhead during inference. Our experiments demonstrate substantial improvements in output quality with the quantized diffusion model across various datasets.
arXiv Detail & Related papers (2023-06-04T09:49:43Z)
Frequency Domain Model Augmentation for Adversarial Attack [91.36850162147678]
For black-box attacks, the gap between the substitute model and the victim model is usually large. We propose a novel spectrum simulation attack to craft more transferable adversarial examples against both normally trained and defense models.
arXiv Detail & Related papers (2022-07-12T08:26:21Z)
Defending Variational Autoencoders from Adversarial Attacks with MCMC [74.36233246536459]
Variational autoencoders (VAEs) are deep generative models used in various domains. As previous work has shown, one can easily fool VAEs to produce unexpected latent representations and reconstructions for a visually slightly modified input. Here, we examine several objective functions for adversarial attacks construction, suggest metrics assess the model robustness, and propose a solution.
arXiv Detail & Related papers (2022-03-18T13:25:18Z)
Powerpropagation: A sparsity inducing weight reparameterisation [65.85142037667065]
We introduce Powerpropagation, a new weight- parameterisation for neural networks that leads to inherently sparse models. Models trained in this manner exhibit similar performance, but have a distribution with markedly higher density at zero, allowing more parameters to be pruned safely. Here, we combine Powerpropagation with a traditional weight-pruning technique as well as recent state-of-the-art sparse-to-sparse algorithms, showing superior performance on the ImageNet benchmark.
arXiv Detail & Related papers (2021-10-01T10:03:57Z)
Post-Training Quantization for Vision Transformer [85.57953732941101]
We present an effective post-training quantization algorithm for reducing the memory storage and computational costs of vision transformers. We can obtain an 81.29% top-1 accuracy using DeiT-B model on ImageNet dataset with about 8-bit quantization.
arXiv Detail & Related papers (2021-06-27T06:27:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.