Related papers: AQUA-LLM: Evaluating Accuracy, Quantization, and Adversarial Robustness Trade-offs in LLMs for Cybersecurity Question Answering

AQUA-LLM: Evaluating Accuracy, Quantization, and Adversarial Robustness Trade-offs in LLMs for Cybersecurity Question Answering

URL: http://arxiv.org/abs/2509.13514v1
Date: Tue, 16 Sep 2025 20:19:24 GMT
Title: AQUA-LLM: Evaluating Accuracy, Quantization, and Adversarial Robustness Trade-offs in LLMs for Cybersecurity Question Answering
Authors: Onat Gungor, Roshan Sood, Harold Wang, Tajana Rosing,
Abstract summary: Large Language Models (LLMs) have recently demonstrated strong potential for cybersecurity question answering (QA)<n>Their substantial computational demands pose significant challenges for deployment on resource-constrained edge devices.<n>We propose AQUA-LLM, an evaluation framework designed to benchmark several state-of-the-art small LLMs under four distinct configurations.
Score: 8.946002046630845
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) have recently demonstrated strong potential for cybersecurity question answering (QA), supporting decision-making in real-time threat detection and response workflows. However, their substantial computational demands pose significant challenges for deployment on resource-constrained edge devices. Quantization, a widely adopted model compression technique, can alleviate these constraints. Nevertheless, quantization may degrade model accuracy and increase susceptibility to adversarial attacks. Fine-tuning offers a potential means to mitigate these limitations, but its effectiveness when combined with quantization remains insufficiently explored. Hence, it is essential to understand the trade-offs among accuracy, efficiency, and robustness. We propose AQUA-LLM, an evaluation framework designed to benchmark several state-of-the-art small LLMs under four distinct configurations: base, quantized-only, fine-tuned, and fine-tuned combined with quantization, specifically for cybersecurity QA. Our results demonstrate that quantization alone yields the lowest accuracy and robustness despite improving efficiency. In contrast, combining quantization with fine-tuning enhances both LLM robustness and predictive performance, achieving an optimal balance of accuracy, robustness, and efficiency. These findings highlight the critical need for quantization-aware, robustness-preserving fine-tuning methodologies to enable the robust and efficient deployment of LLMs for cybersecurity QA.

Related papers

Quantization-Aware Collaborative Inference for Large Embodied AI Models [67.66340659245186]
Large artificial intelligence models (LAIMs) are increasingly regarded as a core intelligence engine for embodied AI applications.<n>To address this issue, we investigate quantization-aware collaborative inference (co-inference) for embodied AI systems.
arXiv Detail & Related papers (2026-02-13T16:08:19Z)
Reliable LLM-Based Edge-Cloud-Expert Cascades for Telecom Knowledge Systems [54.916243942641444]
Large language models (LLMs) are emerging as key enablers of automation in domains such as telecommunications.<n>We study an edge-cloud-expert cascaded LLM-based knowledge system that supports decision-making through a question-and-answer pipeline.
arXiv Detail & Related papers (2025-12-23T03:10:09Z)
Enhancing Trustworthiness with Mixed Precision: Benchmarks, Opportunities, and Challenges [12.438306093697]
Large language models (LLMs) have shown promising performance across various tasks.<n>LLMs' autoregressive decoding process poses significant challenges for efficient deployment on existing AI hardware.
arXiv Detail & Related papers (2025-11-27T14:17:43Z)
A Fano-Style Accuracy Upper Bound for LLM Single-Pass Reasoning in Multi-Hop QA [65.38186593873313]
Multi-Hop Question Answering (MHQA) requires integrating dispersed, interdependent evidence through sequential reasoning under noise.<n>We introduce a proof-of-concept multi-call framework for MHQA, InfoQA.<n>We construct a stringent and noise-rich benchmark to validate our theory and framework.
arXiv Detail & Related papers (2025-09-25T14:11:57Z)
Progressive Element-wise Gradient Estimation for Neural Network Quantization [2.1413624861650358]
Quantization-Aware Training (QAT) methods rely on the Straight-Through Estimator (STE) to address the non-differentiability of discretization functions.<n>We propose Progressive Element-wise Gradient Estimation (PEGE) to address discretization errors between continuous and quantized values.<n>PEGE consistently outperforms existing backpropagation methods and enables low-precision models to match or even outperform the accuracy of their full-precision counterparts.
arXiv Detail & Related papers (2025-08-27T15:59:36Z)
ZeroQAT: Your Quantization-aware Training but Efficient [53.25965863436039]
Quantization is an effective technique to reduce the deployment cost of large language models (LLMs)<n>Existing low-bit PTQ methods suffer from accuracy degradation because their layer-wise optimization introduces cumulative error propagation and misalignment between local reconstruction objectives and downstream performance.<n>We propose ZeroQAT, a zeroth-order optimization-based QAT framework.
arXiv Detail & Related papers (2025-08-21T01:18:27Z)
Q-resafe: Assessing Safety Risks and Quantization-aware Safety Patching for Quantized Large Language Models [37.68831497886983]
Quantized large language models (LLMs) have gained increasing attention and significance for enabling deployment in resource-constrained environments.<n>We present comprehensive safety evaluations across various mainstream quantization techniques and diverse calibration datasets.<n>We propose a quantization-aware safety patching framework, Q-resafe, to efficiently restore the safety capabilities of quantized LLMs.
arXiv Detail & Related papers (2025-06-25T08:52:22Z)
Enhancing LLM Reliability via Explicit Knowledge Boundary Modeling [48.15636223774418]
Large language models (LLMs) are prone to hallucination stemming from misaligned self-awareness.<n>We propose the Explicit Knowledge Boundary Modeling framework to integrate fast and slow reasoning systems to harmonize reliability and usability.
arXiv Detail & Related papers (2025-03-04T03:16:02Z)
LEP-QNN: Loan Eligibility Prediction using Quantum Neural Networks [4.2435928520499635]
We propose a novel approach that employs Quantum Machine Learning (QML) for Loan Eligibility Prediction using Quantum Neural Networks (LEP-QNN)<n>Our innovative approach achieves an accuracy of 98% in predicting loan eligibility from a single, comprehensive dataset.<n>This research showcases the potential of QML in financial predictions and establishes a foundational guide for advancing QML technologies.
arXiv Detail & Related papers (2024-12-04T09:35:03Z)
On-Chip Hardware-Aware Quantization for Mixed Precision Neural Networks [52.97107229149988]
We propose an On-Chip Hardware-Aware Quantization framework, performing hardware-aware mixed-precision quantization on deployed edge devices. For efficiency metrics, we built an On-Chip Quantization Aware pipeline, which allows the quantization process to perceive the actual hardware efficiency of the quantization operator. For accuracy metrics, we propose Mask-Guided Quantization Estimation technology to effectively estimate the accuracy impact of operators in the on-chip scenario.
arXiv Detail & Related papers (2023-09-05T04:39:34Z)
Quantization-aware Interval Bound Propagation for Training Certifiably Robust Quantized Neural Networks [58.195261590442406]
We study the problem of training and certifying adversarially robust quantized neural networks (QNNs) Recent work has shown that floating-point neural networks that have been verified to be robust can become vulnerable to adversarial attacks after quantization. We present quantization-aware interval bound propagation (QA-IBP), a novel method for training robust QNNs.
arXiv Detail & Related papers (2022-11-29T13:32:38Z)
Potential and limitations of quantum extreme learning machines [55.41644538483948]
We present a framework to model QRCs and QELMs, showing that they can be concisely described via single effective measurements. Our analysis paves the way to a more thorough understanding of the capabilities and limitations of both QELMs and QRCs.
arXiv Detail & Related papers (2022-10-03T09:32:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.