Related papers: Q-resafe: Assessing Safety Risks and Quantization-aware Safety Patching for Quantized Large Language Models

Q-resafe: Assessing Safety Risks and Quantization-aware Safety Patching for Quantized Large Language Models

URL: http://arxiv.org/abs/2506.20251v1
Date: Wed, 25 Jun 2025 08:52:22 GMT
Title: Q-resafe: Assessing Safety Risks and Quantization-aware Safety Patching for Quantized Large Language Models
Authors: Kejia Chen, Jiawen Zhang, Jiacong Hu, Yu Wang, Jian Lou, Zunlei Feng, Mingli Song,
Abstract summary: Quantized large language models (LLMs) have gained increasing attention and significance for enabling deployment in resource-constrained environments.<n>We present comprehensive safety evaluations across various mainstream quantization techniques and diverse calibration datasets.<n>We propose a quantization-aware safety patching framework, Q-resafe, to efficiently restore the safety capabilities of quantized LLMs.
Score: 37.68831497886983
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Quantized large language models (LLMs) have gained increasing attention and significance for enabling deployment in resource-constrained environments. However, emerging studies on a few calibration dataset-free quantization methods suggest that quantization may compromise the safety capabilities of LLMs, underscoring the urgent need for systematic safety evaluations and effective mitigation strategies. In this paper, we present comprehensive safety evaluations across various mainstream quantization techniques and diverse calibration datasets, utilizing widely accepted safety benchmarks. To address the identified safety vulnerabilities, we propose a quantization-aware safety patching framework, Q-resafe, to efficiently restore the safety capabilities of quantized LLMs while minimizing any adverse impact on utility. Extensive experimental results demonstrate that Q-resafe successfully re-aligns the safety of quantized LLMs with their pre-quantization counterparts, even under challenging evaluation scenarios. Project page is available at: https://github.com/Thecommonirin/Qresafe.

Related papers

Learning Safety Constraints for Large Language Models [41.95596134688853]
Large language models (LLMs) pose significant safety risks through harmful outputs and vulnerability to adversarial attacks.<n>We propose SaP, a geometric approach to safety that learns and enforces multiple safety constraints directly in the model's representation space.<n>We develop a framework that identifies safe and unsafe regions via the polytope's facets, enabling both detection and correction of unsafe outputs.
arXiv Detail & Related papers (2025-05-30T10:30:24Z)
Investigating the Impact of Quantization Methods on the Safety and Reliability of Large Language Models [16.30545036335344]
We release a human-curated safety dataset with 1.067 challenging questions to rigorously evaluate model behavior.<n>We assess 66 quantized variants of four large language models using four post-training quantization (PTQ) and two quantization-aware training (QAT) methods.<n>Our results show both PTQ and QAT can degrade safety alignment, with QAT techniques like QLORA or STE performing less safely.
arXiv Detail & Related papers (2025-02-18T20:32:05Z)
SafeBench: A Safety Evaluation Framework for Multimodal Large Language Models [75.67623347512368]
We propose toolns, a comprehensive framework designed for conducting safety evaluations of MLLMs. Our framework consists of a comprehensive harmful query dataset and an automated evaluation protocol. Based on our framework, we conducted large-scale experiments on 15 widely-used open-source MLLMs and 6 commercial MLLMs.
arXiv Detail & Related papers (2024-10-24T17:14:40Z)
Evaluation Framework for Quantum Security Risk Assessment: A Comprehensive Strategy for Quantum-Safe Transition [0.03749861135832072]
The rise of large-scale quantum computing poses a significant threat to traditional cryptographic security measures. Quantum attacks undermine current asymmetric cryptographic algorithms, rendering them ineffective. This study explores the challenges of migrating to quantum-safe cryptographic states.
arXiv Detail & Related papers (2024-04-12T04:18:58Z)
Efficiently Computable Safety Bounds for Gaussian Processes in Active Learning [6.217857116096573]
In many technical applications the design space is explored via continuous trajectories, along which the safety needs to be assessed. This is particularly challenging for strict safety requirements in GP methods, as it employs computationally expensive Monte-Carlo sampling of high quantiles. We address these challenges by providing provable safety bounds based on the adaptively sampled median of the supremum of the posterior GP.
arXiv Detail & Related papers (2024-02-28T11:47:15Z)
The Art of Defending: A Systematic Evaluation and Analysis of LLM Defense Strategies on Safety and Over-Defensiveness [56.174255970895466]
Large Language Models (LLMs) play an increasingly pivotal role in natural language processing applications. This paper presents Safety and Over-Defensiveness Evaluation (SODE) benchmark.
arXiv Detail & Related papers (2023-12-30T17:37:06Z)
Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement Learning in Unknown Stochastic Environments [84.3830478851369]
We propose a safe reinforcement learning approach that can jointly learn the environment and optimize the control policy. Our approach can effectively enforce hard safety constraints and significantly outperform CMDP-based baseline methods in system safe rate measured via simulations.
arXiv Detail & Related papers (2022-09-29T20:49:25Z)
Recursively Feasible Probabilistic Safe Online Learning with Control Barrier Functions [60.26921219698514]
We introduce a model-uncertainty-aware reformulation of CBF-based safety-critical controllers. We then present the pointwise feasibility conditions of the resulting safety controller. We use these conditions to devise an event-triggered online data collection strategy.
arXiv Detail & Related papers (2022-08-23T05:02:09Z)
Log Barriers for Safe Black-box Optimization with Application to Safe Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial. Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size. We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.