Related papers: Investigating the Impact of Quantization on Adversarial Robustness

Investigating the Impact of Quantization on Adversarial Robustness

URL: http://arxiv.org/abs/2404.05639v1
Date: Mon, 8 Apr 2024 16:20:15 GMT
Title: Investigating the Impact of Quantization on Adversarial Robustness
Authors: Qun Li, Yuan Meng, Chen Tang, Jiacheng Jiang, Zhi Wang,
Abstract summary: Quantization is a technique for reducing the bit-width of deep models to improve their runtime performance and storage efficiency. In real-world scenarios, quantized models are often faced with adversarial attacks which cause the model to make incorrect inferences. We conduct a first-time analysis of the impact of the quantization pipeline components that can incorporate robust optimization.
Score: 22.637585106574722
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Quantization is a promising technique for reducing the bit-width of deep models to improve their runtime performance and storage efficiency, and thus becomes a fundamental step for deployment. In real-world scenarios, quantized models are often faced with adversarial attacks which cause the model to make incorrect inferences by introducing slight perturbations. However, recent studies have paid less attention to the impact of quantization on the model robustness. More surprisingly, existing studies on this topic even present inconsistent conclusions, which prompted our in-depth investigation. In this paper, we conduct a first-time analysis of the impact of the quantization pipeline components that can incorporate robust optimization under the settings of Post-Training Quantization and Quantization-Aware Training. Through our detailed analysis, we discovered that this inconsistency arises from the use of different pipelines in different studies, specifically regarding whether robust optimization is performed and at which quantization stage it occurs. Our research findings contribute insights into deploying more secure and robust quantized networks, assisting practitioners in reference for scenarios with high-security requirements and limited resources.

Related papers

Quantization-Aware Collaborative Inference for Large Embodied AI Models [67.66340659245186]
Large artificial intelligence models (LAIMs) are increasingly regarded as a core intelligence engine for embodied AI applications.<n>To address this issue, we investigate quantization-aware collaborative inference (co-inference) for embodied AI systems.
arXiv Detail & Related papers (2026-02-13T16:08:19Z)
Assessing the Potential for Catastrophic Failure in Dynamic Post-Training Quantization [3.437656066916039]
Post-training quantization (PTQ) has emerged as an effective tool for reducing the computational complexity and memory usage of a neural network.<n>There is the potential for drastic performance reduction depending upon the distribution of inputs experienced in inference.
arXiv Detail & Related papers (2025-10-02T18:13:06Z)
The Disparate Impacts of Speculative Decoding [54.98795989404752]
speculative decoding is a technique for systematically reducing the decoding time of large language models.<n>The paper shows that speed-up gained from speculative decoding is not uniformly distributed across tasks, consistently diminishing for under-fit, and often underrepresented tasks.
arXiv Detail & Related papers (2025-10-02T15:38:57Z)
Explaining How Quantization Disparately Skews a Model [8.210473195536077]
Post Training Quantization (PTQ) is widely adopted due to its high compression capacity and speed with minimal impact on accuracy.<n>We observed that disparate impacts are exacerbated by quantization, especially for minority groups.<n>We explore how the changes in weights and activations induced by quantization cause cascaded impacts in the network, resulting in logits with lower variance, increased loss, and compromised group accuracies.
arXiv Detail & Related papers (2025-09-08T21:04:16Z)
Merge-Friendly Post-Training Quantization for Multi-Target Domain Adaptation [7.193483612237862]
In this study, we analyze the impact of quantization on model merging through the lens of error barriers.<n>We propose a novel post-training quantization, HDRQ - Hessian and distant regularizing quantization, that is designed to consider model merging for multi-target domain adaptation.<n>Our approach ensures that the quantization process incurs minimal deviation from the source pre-trained model while flattening the loss surface to facilitate smooth model merging.
arXiv Detail & Related papers (2025-05-29T17:00:56Z)
Diffusion Model Quantization: A Review [36.22019054372206]
Recent success of large text-to-image models has underscored the exceptional performance of diffusion models in generative tasks.<n>Diffusion model quantization has emerged as a pivotal technique for both compression and acceleration.
arXiv Detail & Related papers (2025-05-08T13:09:34Z)
Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models [48.98109982725689]
We conduct the first systematic study on quantized reasoning models, evaluating the open-sourced DeepSeek-R1-Distilled Qwen and LLaMA families. Our investigation covers weight, KV cache, and activation quantization using state-of-the-art algorithms at varying bit-widths. We identify model size, model origin, and task difficulty as critical determinants of performance.
arXiv Detail & Related papers (2025-04-07T08:22:45Z)
Model Hemorrhage and the Robustness Limits of Large Language Models [119.46442117681147]
Large language models (LLMs) demonstrate strong performance across natural language processing tasks, yet undergo significant performance degradation when modified for deployment. We define this phenomenon as model hemorrhage - performance decline caused by parameter alterations and architectural changes.
arXiv Detail & Related papers (2025-03-31T10:16:03Z)
Saliency Assisted Quantization for Neural Networks [0.0]
This paper tackles the inherent black-box nature of deep learning models by providing real-time explanations during the training phase. We employ established quantization techniques to address resource constraints. To assess the effectiveness of our approach, we explore how quantization influences the interpretability and accuracy of Convolutional Neural Networks.
arXiv Detail & Related papers (2024-11-07T05:16:26Z)
Compensate Quantization Errors+: Quantized Models Are Inquisitive Learners [17.43650511873449]
Large Language Models (LLMs) showcase remarkable performance and robust deductive capabilities, yet their expansive size complicates deployment and raises environmental concerns due to substantial resource consumption. We have developed innovative methods that enhance the performance of quantized LLMs, particularly in low-bit settings. Our methods consistently deliver state-of-the-art results across various quantization scenarios and offer deep theoretical insights into the quantization process, elucidating the potential of quantized models for widespread application.
arXiv Detail & Related papers (2024-07-22T09:45:16Z)
See Further for Parameter Efficient Fine-tuning by Standing on the Shoulders of Decomposition [56.87609859444084]
parameter-efficient fine-tuning (PEFT) focuses on optimizing a select subset of parameters while keeping the rest fixed, significantly lowering computational and storage overheads. We take the first step to unify all approaches by dissecting them from a decomposition perspective. We introduce two novel PEFT methods alongside a simple yet effective framework designed to enhance the performance of PEFT techniques across various applications.
arXiv Detail & Related papers (2024-07-07T15:44:42Z)
QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning [52.157939524815866]
In this paper, we identify imbalanced activation distributions as a primary source of quantization difficulty.<n>We propose to adjust these distributions through weight finetuning to be more quantization-friendly.<n>Our method demonstrates its efficacy across three high-resolution image generation tasks.
arXiv Detail & Related papers (2024-02-06T03:39:44Z)
Effect of Weight Quantization on Learning Models by Typical Case Analysis [6.9060054915724]
The recent surge in data analysis scale has significantly increased computational resource requirements. Quantization is vital for deploying large models on devices with limited computational resources.
arXiv Detail & Related papers (2024-01-30T18:58:46Z)
RobustMQ: Benchmarking Robustness of Quantized Models [54.15661421492865]
Quantization is an essential technique for deploying deep neural networks (DNNs) on devices with limited resources. We thoroughly evaluated the robustness of quantized models against various noises (adrial attacks, natural corruptions, and systematic noises) on ImageNet. Our research contributes to advancing the robust quantization of models and their deployment in real-world scenarios.
arXiv Detail & Related papers (2023-08-04T14:37:12Z)
Do Emergent Abilities Exist in Quantized Large Language Models: An Empirical Study [90.34226812493083]
This work aims to investigate the impact of quantization on emphemergent abilities, which are important characteristics that distinguish LLMs from small language models. Our empirical experiments show that these emergent abilities still exist in 4-bit quantization models, while 2-bit models encounter severe performance degradation. To improve the performance of low-bit models, we conduct two special experiments: (1) fine-gained impact analysis that studies which components (or substructures) are more sensitive to quantization, and (2) performance compensation through model fine-tuning.
arXiv Detail & Related papers (2023-07-16T15:11:01Z)
PreQuant: A Task-agnostic Quantization Approach for Pre-trained Language Models [52.09865918265002]
We propose a novel quantize before fine-tuning'' framework, PreQuant. PreQuant is compatible with various quantization strategies, with outlier-aware fine-tuning incorporated to correct the induced quantization error. We demonstrate the effectiveness of PreQuant on the GLUE benchmark using BERT, RoBERTa, and T5.
arXiv Detail & Related papers (2023-05-30T08:41:33Z)
Re-thinking Data Availablity Attacks Against Deep Neural Networks [53.64624167867274]
In this paper, we re-examine the concept of unlearnable examples and discern that the existing robust error-minimizing noise presents an inaccurate optimization objective. We introduce a novel optimization paradigm that yields improved protection results with reduced computational time requirements.
arXiv Detail & Related papers (2023-05-18T04:03:51Z)
Benchmarking the Robustness of Quantized Models [12.587947681480909]
Quantization is an essential technique for deploying deep neural networks (DNNs) on devices with limited resources. Existing research on this topic is limited and often disregards established principles of evaluation. Our research contributes to advancing the robust quantization of models and their deployment in real-world scenarios.
arXiv Detail & Related papers (2023-04-08T09:34:55Z)
Counterfactual Learning with Multioutput Deep Kernels [0.0]
In this paper, we address the challenge of performing counterfactual inference with observational data. We present a general class of counterfactual multi-task deep kernels models that estimate causal effects and learn policies proficiently.
arXiv Detail & Related papers (2022-11-20T23:28:41Z)
Accurate and Robust Feature Importance Estimation under Distribution Shifts [49.58991359544005]
PRoFILE is a novel feature importance estimation method. We show significant improvements over state-of-the-art approaches, both in terms of fidelity and robustness.
arXiv Detail & Related papers (2020-09-30T05:29:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.