Bits for Privacy: Evaluating Post-Training Quantization via Membership Inference
- URL: http://arxiv.org/abs/2512.15335v1
- Date: Wed, 17 Dec 2025 11:28:58 GMT
- Title: Bits for Privacy: Evaluating Post-Training Quantization via Membership Inference
- Authors: Chenxiang Zhang, Tongxi Qu, Zhong Li, Tian Zhang, Jun Pang, Sjouke Mauw,
- Abstract summary: We present the first systematic study of the privacy-utility relationship in post-training quantization (PTQ)<n>We analyze three popular PTQ algorithms-AdaRound, BRECQ, and OBC-across multiple precision levels (4-bit, 2-bit, and 1.58-bit) on CIFAR-10, CIFAR-100, and TinyImageNet datasets.<n>Our findings consistently show that low-precision PTQs can reduce privacy leakage.
- Score: 18.395729829969433
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep neural networks are widely deployed with quantization techniques to reduce memory and computational costs by lowering the numerical precision of their parameters. While quantization alters model parameters and their outputs, existing privacy analyses primarily focus on full-precision models, leaving a gap in understanding how bit-width reduction can affect privacy leakage. We present the first systematic study of the privacy-utility relationship in post-training quantization (PTQ), a versatile family of methods that can be applied to pretrained models without further training. Using membership inference attacks as our evaluation framework, we analyze three popular PTQ algorithms-AdaRound, BRECQ, and OBC-across multiple precision levels (4-bit, 2-bit, and 1.58-bit) on CIFAR-10, CIFAR-100, and TinyImageNet datasets. Our findings consistently show that low-precision PTQs can reduce privacy leakage. In particular, lower-precision models demonstrate up to an order of magnitude reduction in membership inference vulnerability compared to their full-precision counterparts, albeit at the cost of decreased utility. Additional ablation studies on the 1.58-bit quantization level show that quantizing only the last layer at higher precision enables fine-grained control over the privacy-utility trade-off. These results offer actionable insights for practitioners to balance efficiency, utility, and privacy protection in real-world deployments.
Related papers
- Quantization-Aware Regularizers for Deep Neural Networks Compression [0.061173711613792085]
We introduce per-layer regularization terms that drive weights to naturally form clusters during training.<n>This reduces the accuracy loss typically associated with quantization methods.<n> Experiments on CIFAR-10 with AlexNet and VGG16 models confirm the effectiveness of the proposed strategy.
arXiv Detail & Related papers (2026-02-03T15:07:43Z) - CAGE: Curvature-Aware Gradient Estimation For Accurate Quantization-Aware Training [73.46600457802693]
We introduce a new method that counteracts the loss induced by quantization.<n>CAGE significantly improves upon the state-of-theart methods in terms of accuracy, for similar computational cost.<n>For QAT pre-training of Llama models, CAGE matches the accuracy achieved at 4-bits (W4A4) with the prior best method.
arXiv Detail & Related papers (2025-10-21T16:33:57Z) - Beyond Outliers: A Study of Optimizers Under Quantization [82.75879062804955]
We study impact of choice on model robustness under quantization.<n>We evaluate how model performance degrades when trained with different baselines.<n>We derive scaling laws for quantization-aware training under different parameters.
arXiv Detail & Related papers (2025-09-27T21:15:22Z) - DPQuant: Efficient and Differentially-Private Model Training via Dynamic Quantization Scheduling [7.79764032127686]
Differentially-Private SGD (DP-SGD) is a powerful technique to protect user privacy when using sensitive data to train neural networks.<n>We show that quantization causes significantly higher accuracy degradation in DP-SGD compared to regular SGD.<n>We present QPQuant, a dynamic quantization framework that adaptively selects a changing subset of layers to quantize at each epoch.
arXiv Detail & Related papers (2025-09-03T16:51:26Z) - End-to-End On-Device Quantization-Aware Training for LLMs at Inference Cost [53.25965863436039]
Quantization-aware training (QAT) provides a more principled solution, but its reliance on backpropagation incurs prohibitive memory costs.<n>We propose ZeroQAT, a zeroth-order optimization-based QAT framework that supports both weight and activation quantization.<n>Experiments show that ZeroQAT consistently outperforms representative PTQ and QAT baselines while requiring significantly less memory.
arXiv Detail & Related papers (2025-08-21T01:18:27Z) - Improving Quantization-aware Training of Low-Precision Network via Block Replacement on Full-Precision Counterpart [1.5508907979229383]
Quantization-aware training (QAT) is a common paradigm for network quantization.<n>The low-precision model exhibits limited representation capabilities and cannot directly replicate full-precision calculations.<n>We propose a general QAT framework for alleviating the concerns by permitting the forward and backward processes of the low-precision network to be guided by the full-precision partner.
arXiv Detail & Related papers (2024-12-20T12:38:18Z) - GAQAT: gradient-adaptive quantization-aware training for domain generalization [54.31450550793485]
We propose a novel Gradient-Adaptive Quantization-Aware Training (GAQAT) framework for DG.<n>Our approach begins by identifying the scale-gradient conflict problem in low-precision quantization.<n>Extensive experiments validate the effectiveness of the proposed GAQAT framework.
arXiv Detail & Related papers (2024-12-07T06:07:21Z) - AlignedKV: Reducing Memory Access of KV-Cache with Precision-Aligned Quantization [5.572159724234467]
Mixed-precision quantization distinguishes between important and unimportant parameters.
Existing approaches can only identify important parameters through qualitative analysis and manual experiments.
We propose a new criterion, so-called 'precision alignment', to build a quantitative framework to holistically evaluate the importance of parameters.
arXiv Detail & Related papers (2024-09-25T01:39:02Z) - Zero-Shot Sharpness-Aware Quantization for Pre-trained Language Models [88.80146574509195]
Quantization is a promising approach for reducing memory overhead and accelerating inference.
We propose a novel-aware quantization (ZSAQ) framework for the zero-shot quantization of various PLMs.
arXiv Detail & Related papers (2023-10-20T07:09:56Z) - SQuAT: Sharpness- and Quantization-Aware Training for BERT [43.049102196902844]
We propose sharpness- and quantization-aware training (SQuAT)
Our method can consistently outperform state-of-the-art quantized BERT models under 2, 3, and 4-bit settings by 1%.
Our experiments on empirical measurement of sharpness also suggest that our method would lead to flatter minima compared to other quantization methods.
arXiv Detail & Related papers (2022-10-13T16:52:19Z) - A Blessing of Dimensionality in Membership Inference through
Regularization [29.08230123469755]
We show how the number of parameters of a model can induce a privacy--utility trade-off.
We then show that if coupled with proper generalization regularization, increasing the number of parameters of a model can actually increase both its privacy and performance.
arXiv Detail & Related papers (2022-05-27T15:44:00Z) - Privacy Preserving Recalibration under Domain Shift [119.21243107946555]
We introduce a framework that abstracts out the properties of recalibration problems under differential privacy constraints.
We also design a novel recalibration algorithm, accuracy temperature scaling, that outperforms prior work on private datasets.
arXiv Detail & Related papers (2020-08-21T18:43:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.