The Impact of Quantization on the Robustness of Transformer-based Text
Classifiers
- URL: http://arxiv.org/abs/2403.05365v1
- Date: Fri, 8 Mar 2024 14:55:05 GMT
- Title: The Impact of Quantization on the Robustness of Transformer-based Text
Classifiers
- Authors: Seyed Parsa Neshaei, Yasaman Boreshban, Gholamreza Ghassem-Sani, Seyed
Abolghasem Mirroshandel
- Abstract summary: This work is the first application of quantization on the robustness of NLP models.
We evaluate the impact of quantization on BERT and DistilBERT models in text classification using SST-2, Emotion, and MR datasets.
Our experiments indicate that quantization increases the robustness of the model by 18.80% on average compared to adversarial training.
- Score: 5.281054432963503
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transformer-based models have made remarkable advancements in various NLP
areas. Nevertheless, these models often exhibit vulnerabilities when confronted
with adversarial attacks. In this paper, we explore the effect of quantization
on the robustness of Transformer-based models. Quantization usually involves
mapping a high-precision real number to a lower-precision value, aiming at
reducing the size of the model at hand. To the best of our knowledge, this work
is the first application of quantization on the robustness of NLP models. In
our experiments, we evaluate the impact of quantization on BERT and DistilBERT
models in text classification using SST-2, Emotion, and MR datasets. We also
evaluate the performance of these models against TextFooler, PWWS, and PSO
adversarial attacks. Our findings show that quantization significantly improves
(by an average of 18.68%) the adversarial accuracy of the models. Furthermore,
we compare the effect of quantization versus that of the adversarial training
approach on robustness. Our experiments indicate that quantization increases
the robustness of the model by 18.80% on average compared to adversarial
training without imposing any extra computational overhead during training.
Therefore, our results highlight the effectiveness of quantization in improving
the robustness of NLP models.
Related papers
- Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis [63.66763657191476]
We show that efficient numerical training and inference algorithms as low-rank computation have impressive performance for learning Transformer-based adaption.
We analyze how magnitude-based models affect generalization while improving adaption.
We conclude that proper magnitude-based has a slight on the testing performance.
arXiv Detail & Related papers (2024-06-24T23:00:58Z) - When Quantization Affects Confidence of Large Language Models? [4.338589334157708]
We show that GPTQ to 4-bit results in a decrease in confidence regarding true labels, with varying impacts observed among different language models.
We propose an explanation for quantization loss based on confidence levels, indicating that quantization disproportionately affects samples where the full model exhibited low confidence levels in the first place.
arXiv Detail & Related papers (2024-05-01T16:58:28Z) - Adversarial Fine-tuning of Compressed Neural Networks for Joint Improvement of Robustness and Efficiency [3.3490724063380215]
Adrial training has been presented as a mitigation strategy which can result in more robust models.
We explore the effects of two different model compression methods -- structured weight pruning and quantization -- on adversarial robustness.
We show that adversarial fine-tuning of compressed models can achieve robustness performance comparable to adversarially trained models.
arXiv Detail & Related papers (2024-03-14T14:34:25Z) - QuantAttack: Exploiting Dynamic Quantization to Attack Vision
Transformers [29.957089564635083]
We present QuantAttack, a novel attack that targets the availability of quantized models.
We show that carefully crafted adversarial examples, which are designed to exhaust the resources of the operating system, can trigger worst-case performance.
arXiv Detail & Related papers (2023-12-03T18:31:19Z) - QualEval: Qualitative Evaluation for Model Improvement [82.73561470966658]
We propose QualEval, which augments quantitative scalar metrics with automated qualitative evaluation as a vehicle for model improvement.
QualEval uses a powerful LLM reasoner and our novel flexible linear programming solver to generate human-readable insights.
We demonstrate that leveraging its insights, for example, improves the absolute performance of the Llama 2 model by up to 15% points relative.
arXiv Detail & Related papers (2023-11-06T00:21:44Z) - RobustMQ: Benchmarking Robustness of Quantized Models [54.15661421492865]
Quantization is an essential technique for deploying deep neural networks (DNNs) on devices with limited resources.
We thoroughly evaluated the robustness of quantized models against various noises (adrial attacks, natural corruptions, and systematic noises) on ImageNet.
Our research contributes to advancing the robust quantization of models and their deployment in real-world scenarios.
arXiv Detail & Related papers (2023-08-04T14:37:12Z) - Mixed-Precision Inference Quantization: Radically Towards Faster
inference speed, Lower Storage requirement, and Lower Loss [4.877532217193618]
Existing quantization techniques rely heavily on experience and "fine-tuning" skills.
This study provides a methodology for acquiring a mixed-precise quantization model with a lower loss than the full precision model.
In particular, we will demonstrate that neural networks with massive identity mappings are resistant to the quantization method.
arXiv Detail & Related papers (2022-07-20T10:55:34Z) - PLATON: Pruning Large Transformer Models with Upper Confidence Bound of
Weight Importance [114.1541203743303]
We propose PLATON, which captures the uncertainty of importance scores by upper confidence bound (UCB) of importance estimation.
We conduct extensive experiments with several Transformer-based models on natural language understanding, question answering and image classification.
arXiv Detail & Related papers (2022-06-25T05:38:39Z) - MoEfication: Conditional Computation of Transformer Models for Efficient
Inference [66.56994436947441]
Transformer-based pre-trained language models can achieve superior performance on most NLP tasks due to large parameter capacity, but also lead to huge computation cost.
We explore to accelerate large-model inference by conditional computation based on the sparse activation phenomenon.
We propose to transform a large model into its mixture-of-experts (MoE) version with equal model size, namely MoEfication.
arXiv Detail & Related papers (2021-10-05T02:14:38Z) - Zero-shot Adversarial Quantization [11.722728148523366]
We propose a zero-shot adversarial quantization (ZAQ) framework, facilitating effective discrepancy estimation and knowledge transfer.
This is achieved by a novel two-level discrepancy modeling to drive a generator to synthesize informative and diverse data examples.
We conduct extensive experiments on three fundamental vision tasks, demonstrating the superiority of ZAQ over the strong zero-shot baselines.
arXiv Detail & Related papers (2021-03-29T01:33:34Z) - From Sound Representation to Model Robustness [82.21746840893658]
We investigate the impact of different standard environmental sound representations (spectrograms) on the recognition performance and adversarial attack robustness of a victim residual convolutional neural network.
Averaged over various experiments on three environmental sound datasets, we found the ResNet-18 model outperforms other deep learning architectures.
arXiv Detail & Related papers (2020-07-27T17:30:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.