Related papers: DPQuant: Efficient and Differentially-Private Model Training via Dynamic Quantization Scheduling

DPQuant: Efficient and Differentially-Private Model Training via Dynamic Quantization Scheduling

URL: http://arxiv.org/abs/2509.03472v1
Date: Wed, 03 Sep 2025 16:51:26 GMT
Title: DPQuant: Efficient and Differentially-Private Model Training via Dynamic Quantization Scheduling
Authors: Yubo Gao, Renbo Tu, Gennady Pekhimenko, Nandita Vijaykumar,
Abstract summary: Differentially-Private SGD (DP-SGD) is a powerful technique to protect user privacy when using sensitive data to train neural networks.<n>We show that quantization causes significantly higher accuracy degradation in DP-SGD compared to regular SGD.<n>We present QPQuant, a dynamic quantization framework that adaptively selects a changing subset of layers to quantize at each epoch.
Score: 7.79764032127686
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Differentially-Private SGD (DP-SGD) is a powerful technique to protect user privacy when using sensitive data to train neural networks. During training, converting model weights and activations into low-precision formats, i.e., quantization, can drastically reduce training times, energy consumption, and cost, and is thus a widely used technique. In this work, we demonstrate that quantization causes significantly higher accuracy degradation in DP-SGD compared to regular SGD. We observe that this is caused by noise injection in DP-SGD, which amplifies quantization variance, leading to disproportionately large accuracy degradation. To address this challenge, we present QPQuant, a dynamic quantization framework that adaptively selects a changing subset of layers to quantize at each epoch. Our method combines two key ideas that effectively reduce quantization variance: (i) probabilistic sampling of the layers that rotates which layers are quantized every epoch, and (ii) loss-aware layer prioritization, which uses a differentially private loss sensitivity estimator to identify layers that can be quantized with minimal impact on model quality. This estimator consumes a negligible fraction of the overall privacy budget, preserving DP guarantees. Empirical evaluations on ResNet18, ResNet50, and DenseNet121 across a range of datasets demonstrate that DPQuant consistently outperforms static quantization baselines, achieving near Pareto-optimal accuracy-compute trade-offs and up to 2.21x theoretical throughput improvements on low-precision hardware, with less than 2% drop in validation accuracy.

Related papers

Progressive Element-wise Gradient Estimation for Neural Network Quantization [2.1413624861650358]
Quantization-Aware Training (QAT) methods rely on the Straight-Through Estimator (STE) to address the non-differentiability of discretization functions.<n>We propose Progressive Element-wise Gradient Estimation (PEGE) to address discretization errors between continuous and quantized values.<n>PEGE consistently outperforms existing backpropagation methods and enables low-precision models to match or even outperform the accuracy of their full-precision counterparts.
arXiv Detail & Related papers (2025-08-27T15:59:36Z)
Pushing the Limits of Low-Bit Optimizers: A Focus on EMA Dynamics [64.62231094774211]
Statefuls (e.g., Adam) maintain auxiliary information even 2x the model size in order to achieve optimal convergence.<n>SOLO enables Adam-styles to maintain quantized states with precision as low as 3 bits, or even 2 bits.<n>SOLO can thus be seamlessly applied to Adam-styles, leading to substantial memory savings with minimal accuracy loss.
arXiv Detail & Related papers (2025-05-01T06:47:45Z)
On the Convergence of DP-SGD with Adaptive Clipping [56.24689348875711]
Gradient Descent with gradient clipping is a powerful technique for enabling differentially private optimization.<n>This paper provides the first comprehensive convergence analysis of SGD with quantile clipping (QC-SGD)<n>We show how QC-SGD suffers from a bias problem similar to constant-threshold clipped SGD but can be mitigated through a carefully designed quantile and step size schedule.
arXiv Detail & Related papers (2024-12-27T20:29:47Z)
Gradient-based Automatic Mixed Precision Quantization for Neural Networks On-Chip [0.9187138676564589]
We present High Granularity Quantization (HGQ), an innovative quantization-aware training method. HGQ fine-tune the per-weight and per-activation precision by making them optimizable through gradient descent. This approach enables ultra-low latency and low power neural networks on hardware capable of performing arithmetic operations.
arXiv Detail & Related papers (2024-05-01T17:18:46Z)
Augmenting Hessians with Inter-Layer Dependencies for Mixed-Precision Post-Training Quantization [7.392278887917975]
We propose a mixed-precision post training quantization approach that assigns different numerical precisions to tensors in a network based on their specific needs. Our experiments demonstrate latency reductions compared to a 16-bit baseline of $25.48%$, $21.69%$, and $33.28%$ respectively.
arXiv Detail & Related papers (2023-06-08T02:18:58Z)
CSQ: Growing Mixed-Precision Quantization Scheme with Bi-level Continuous Sparsification [51.81850995661478]
Mixed-precision quantization has been widely applied on deep neural networks (DNNs) Previous attempts on bit-level regularization and pruning-based dynamic precision adjustment during training suffer from noisy gradients and unstable convergence. We propose Continuous Sparsification Quantization (CSQ), a bit-level training method to search for mixed-precision quantization schemes with improved stability.
arXiv Detail & Related papers (2022-12-06T05:44:21Z)
Neural Networks with Quantization Constraints [111.42313650830248]
We present a constrained learning approach to quantization training. We show that the resulting problem is strongly dual and does away with gradient estimations. We demonstrate that the proposed approach exhibits competitive performance in image classification tasks.
arXiv Detail & Related papers (2022-10-27T17:12:48Z)
DPIS: An Enhanced Mechanism for Differentially Private SGD with Importance Sampling [23.8561225168394]
differential privacy (DP) has become a well-accepted standard for privacy protection, and deep neural networks (DNN) have been immensely successful in machine learning. A classic mechanism for this purpose is DP-SGD, which is a differentially private version of the gradient descent (SGD) commonly used for training. We propose DPIS, a novel mechanism for differentially private SGD training that can be used as a drop-in replacement of the core of DP-SGD.
arXiv Detail & Related papers (2022-10-18T07:03:14Z)
BiTAT: Neural Network Binarization with Task-dependent Aggregated Transformation [116.26521375592759]
Quantization aims to transform high-precision weights and activations of a given neural network into low-precision weights/activations for reduced memory usage and computation. Extreme quantization (1-bit weight/1-bit activations) of compactly-designed backbone architectures results in severe performance degeneration. This paper proposes a novel Quantization-Aware Training (QAT) method that can effectively alleviate performance degeneration.
arXiv Detail & Related papers (2022-07-04T13:25:49Z)
Low-Precision Stochastic Gradient Langevin Dynamics [70.69923368584588]
We provide the first study of low-precision Gradient Langevin Dynamics, showing that its costs can be significantly reduced without sacrificing performance. We develop a new quantization function for SGLD that preserves the variance in each update step. We demonstrate that low-precision SGLD achieves comparable performance to full-precision SGLD with only 8 bits on a variety of deep learning tasks.
arXiv Detail & Related papers (2022-06-20T17:25:41Z)
AMED: Automatic Mixed-Precision Quantization for Edge Devices [3.5223695602582614]
Quantized neural networks are well known for reducing the latency, power consumption, and model size without significant harm to the performance. Mixed-precision quantization offers better utilization of customized hardware that supports arithmetic operations at different bitwidths.
arXiv Detail & Related papers (2022-05-30T21:23:22Z)
Sharpness-aware Quantization for Deep Neural Networks [45.150346855368]
Sharpness-Aware Quantization (SAQ) is a novel method to explore the effect of Sharpness-Aware Minimization (SAM) on model compression. We show that SAQ improves the generalization performance of the quantized models, yielding the SOTA results in uniform quantization.
arXiv Detail & Related papers (2021-11-24T05:16:41Z)
AQD: Towards Accurate Fully-Quantized Object Detection [94.06347866374927]
We propose an Accurate Quantized object Detection solution, termed AQD, to get rid of floating-point computation. Our AQD achieves comparable or even better performance compared with the full-precision counterpart under extremely low-bit schemes.
arXiv Detail & Related papers (2020-07-14T09:07:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.