Evaluating the Impact of Post-Training Quantization on Reliable VQA with Multimodal LLMs
- URL: http://arxiv.org/abs/2602.13289v1
- Date: Sun, 08 Feb 2026 20:06:24 GMT
- Title: Evaluating the Impact of Post-Training Quantization on Reliable VQA with Multimodal LLMs
- Authors: Paul Jonas Kurz, Tobias Jan Wieczorek, Mohamed A. Abdelsalam, Rahaf Aljundi, Marcus Rohrbach,
- Abstract summary: We study how Post-Training Quantization (PTQ) compression affects both accuracy and reliability in Visual Question Answering (VQA)<n>We adapt the Selector confidence estimator for quantized multimodal settings and test its robustness across various quantization levels and out-of-distribution (OOD) scenarios.
- Score: 12.376901102913417
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multimodal Large Language Models (MLLM) are increasingly deployed in domains where both reliability and efficiency are critical. However, current models remain overconfident, producing highly certain but incorrect answers. At the same time, their large size limits deployment on edge devices, necessitating compression. We study the intersection of these two challenges by analyzing how Post-Training Quantization (PTQ) compression affects both accuracy and reliability in Visual Question Answering (VQA). We evaluate two MLLMs, Qwen2-VL-7B and Idefics3-8B, quantized with data-free (HQQ) and data-aware (MBQ) methods across multiple bit widths. To counteract the reduction in reliability caused by quantization, we adapt the Selector confidence estimator for quantized multimodal settings and test its robustness across various quantization levels and out-of-distribution (OOD) scenarios. We find that PTQ degrades both accuracy and reliability. Data-aware methods soften the effect thereof. The Selector substantially mitigates the reliability impact. The combination of int4 MBQ and the Selector achieves the best efficiency-reliability trade-off, closing in on uncompressed performance at approx. 75% less memory demand. Overall, we present the first systematic study linking quantization and reliability in multimodal settings.
Related papers
- Can Large Language Models Still Explain Themselves? Investigating the Impact of Quantization on Self-Explanations [18.22236071202241]
Quantization typically leads to moderate declines in both self-explanations (SEs) and faithfulness.<n>No quantization technique consistently excels across task accuracy, SE quality, and faithfulness.
arXiv Detail & Related papers (2026-01-01T09:50:01Z) - Enhancing Trustworthiness with Mixed Precision: Benchmarks, Opportunities, and Challenges [12.438306093697]
Large language models (LLMs) have shown promising performance across various tasks.<n>LLMs' autoregressive decoding process poses significant challenges for efficient deployment on existing AI hardware.
arXiv Detail & Related papers (2025-11-27T14:17:43Z) - SPEED-Q: Staged Processing with Enhanced Distillation towards Efficient Low-bit On-device VLM Quantization [6.872509247180761]
Vision-Language Models (VLMs) are crucial for enabling low-latency and privacy-preserving intelligent applications.<n>We propose SPEED-Q, a novel framework for low-bit weight-only quantization of VLM models.<n>Speedy-Q achieves up to 6x higher accuracy than existing quantization methods under 2-bit settings.
arXiv Detail & Related papers (2025-11-12T02:47:24Z) - Beyond Outliers: A Study of Optimizers Under Quantization [82.75879062804955]
We study impact of choice on model robustness under quantization.<n>We evaluate how model performance degrades when trained with different baselines.<n>We derive scaling laws for quantization-aware training under different parameters.
arXiv Detail & Related papers (2025-09-27T21:15:22Z) - AQUA-LLM: Evaluating Accuracy, Quantization, and Adversarial Robustness Trade-offs in LLMs for Cybersecurity Question Answering [8.946002046630845]
Large Language Models (LLMs) have recently demonstrated strong potential for cybersecurity question answering (QA)<n>Their substantial computational demands pose significant challenges for deployment on resource-constrained edge devices.<n>We propose AQUA-LLM, an evaluation framework designed to benchmark several state-of-the-art small LLMs under four distinct configurations.
arXiv Detail & Related papers (2025-09-16T20:19:24Z) - Quantum Federated Learning for Multimodal Data: A Modality-Agnostic Approach [1.1008520905907015]
Quantum federated learning (QFL) has been introduced to enable a distributed privacy-preserving quantum machine learning (QML) model training across quantum processors (clients)<n>We present for the first time a novel multimodal approach specifically tailored for the QFL setting with the intermediate fusion using quantum entanglement.<n>We introduce a Missing Modality Agnostic (MMA) mechanism that isolates untrained quantum circuits, ensuring stable training without corrupted states.
arXiv Detail & Related papers (2025-07-10T23:33:58Z) - Calibrating Uncertainty Quantification of Multi-Modal LLMs using Grounding [48.92310906093414]
We introduce a novel approach for calibrating uncertainty quantification (UQ) tailored for multi-modal large language models (LLMs)<n>We leverage cross-modal consistency in addition to self-consistency to improve the calibration of the multi-modal models.<n>We evaluate the proposed approach across multiple multi-modal tasks, such as medical question answering (Slake) and visual question answering (VQAv2), considering multi-modal models such as LLaVA-Med and LLaVA.
arXiv Detail & Related papers (2025-04-30T19:19:21Z) - Uncertainty Quantification for LLMs through Minimum Bayes Risk: Bridging Confidence and Consistency [66.96286531087549]
Uncertainty quantification (UQ) methods for Large Language Models (LLMs) encompass a variety of approaches.<n>We propose a novel approach to integrating model confidence with output consistency, resulting in a family of efficient and robust UQ methods.<n>We evaluate our approach across various tasks such as question answering, abstractive summarization, and machine translation.
arXiv Detail & Related papers (2025-02-07T14:30:12Z) - Multi-QuAD: Multi-Level Quality-Adaptive Dynamic Network for Reliable Multimodal Classification [57.08108545219043]
Existing reliable multimodal classification methods fail to provide robust estimation of data quality.<n>New framework for reliable classification termed textitMulti-level Quality-Adaptive Dynamic multimodal network (Multi-QuAD) is proposed.<n>Experiments conducted on four datasets demonstrate that Multi-QuAD significantly outperforms state-of-the-art methods in classification performance and reliability.
arXiv Detail & Related papers (2024-12-19T03:26:51Z) - QSpec: Speculative Decoding with Complementary Quantization Schemes [53.960146187821685]
Quantization is widely adopted to accelerate inference and reduce memory consumption in large language models (LLMs)<n>We propose QSpec, a novel quantization paradigm that decouples efficiency from quality.<n>QSpec reuses both weights and KV cache across stages, enabling near-zero-cost switching without retraining or auxiliary models.
arXiv Detail & Related papers (2024-10-15T05:57:51Z) - LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit [55.73370804397226]
Quantization, a key compression technique, can effectively mitigate these demands by compressing and accelerating large language models.
We present LLMC, a plug-and-play compression toolkit, to fairly and systematically explore the impact of quantization.
Powered by this versatile toolkit, our benchmark covers three key aspects: calibration data, algorithms (three strategies), and data formats.
arXiv Detail & Related papers (2024-05-09T11:49:05Z) - Benchmarking the Reliability of Post-training Quantization: a Particular
Focus on Worst-case Performance [53.45700148820669]
Post-training quantization (PTQ) is a popular method for compressing deep neural networks (DNNs) without modifying their original architecture or training procedures.
Despite its effectiveness and convenience, the reliability of PTQ methods in the presence of some extrem cases such as distribution shift and data noise remains largely unexplored.
This paper first investigates this problem on various commonly-used PTQ methods.
arXiv Detail & Related papers (2023-03-23T02:55:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.