Related papers: Sharpness-Aware Data Generation for Zero-shot Quantization

Sharpness-Aware Data Generation for Zero-shot Quantization

URL: http://arxiv.org/abs/2510.07018v1
Date: Wed, 08 Oct 2025 13:43:39 GMT
Title: Sharpness-Aware Data Generation for Zero-shot Quantization
Authors: Dung Hoang-Anh, Cuong Pham Trung Le, Jianfei Cai, Thanh-Toan Do,
Abstract summary: Zero-shot quantization aims to learn a quantized model from a pre-trained full-precision model with no access to original real training data.<n>This paper introduces a novel methodology that takes into account quantized model sharpness in synthetic data generation to enhance generalization.
Score: 36.10612846041737
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Zero-shot quantization aims to learn a quantized model from a pre-trained full-precision model with no access to original real training data. The common idea in zero-shot quantization approaches is to generate synthetic data for quantizing the full-precision model. While it is well-known that deep neural networks with low sharpness have better generalization ability, none of the previous zero-shot quantization works considers the sharpness of the quantized model as a criterion for generating training data. This paper introduces a novel methodology that takes into account quantized model sharpness in synthetic data generation to enhance generalization. Specifically, we first demonstrate that sharpness minimization can be attained by maximizing gradient matching between the reconstruction loss gradients computed on synthetic and real validation data, under certain assumptions. We then circumvent the problem of the gradient matching without real validation set by approximating it with the gradient matching between each generated sample and its neighbors. Experimental evaluations on CIFAR-100 and ImageNet datasets demonstrate the superiority of the proposed method over the state-of-the-art techniques in low-bit quantization settings.

Related papers

Provable Unlearning with Gradient Ascent on Two-Layer ReLU Neural Networks [30.766189455525765]
Unlearning aims to remove specific data from trained models, addressing growing privacy and ethical concerns.<n>We provide a theoretical analysis of a simple and widely used method - gradient ascent.<n>We show that gradient ascent performs successful unlearning while still preserving generalization in a synthetic Gaussian-mixture setting.
arXiv Detail & Related papers (2025-10-16T16:16:36Z)
Zero-Shot Sharpness-Aware Quantization for Pre-trained Language Models [88.80146574509195]
Quantization is a promising approach for reducing memory overhead and accelerating inference. We propose a novel-aware quantization (ZSAQ) framework for the zero-shot quantization of various PLMs.
arXiv Detail & Related papers (2023-10-20T07:09:56Z)
Data-Free Quantization via Mixed-Precision Compensation without Fine-Tuning [25.226734510004984]
We propose a data-free mixed-precision compensation (DF-MPC) method to recover the performance of an ultra-low precision quantized model without any data and fine-tuning process.<n>Our DF-MPC is able to achieve higher accuracy for an ultra-low precision quantized model compared to the recent methods without any data and fine-tuning process.
arXiv Detail & Related papers (2023-07-02T07:16:29Z)
Exploiting Diffusion Prior for Real-World Image Super-Resolution [75.5898357277047]
We present a novel approach to leverage prior knowledge encapsulated in pre-trained text-to-image diffusion models for blind super-resolution. By employing our time-aware encoder, we can achieve promising restoration results without altering the pre-trained synthesis model.
arXiv Detail & Related papers (2023-05-11T17:55:25Z)
Post-training Model Quantization Using GANs for Synthetic Data Generation [57.40733249681334]
We investigate the use of synthetic data as a substitute for the calibration with real data for the quantization method. We compare the performance of models quantized using data generated by StyleGAN2-ADA and our pre-trained DiStyleGAN, with quantization using real data and an alternative data generation method based on fractal images.
arXiv Detail & Related papers (2023-05-10T11:10:09Z)
Joint Edge-Model Sparse Learning is Provably Efficient for Graph Neural Networks [89.28881869440433]
This paper provides the first theoretical characterization of joint edge-model sparse learning for graph neural networks (GNNs) It proves analytically that both sampling important nodes and pruning neurons with the lowest-magnitude can reduce the sample complexity and improve convergence without compromising the test accuracy.
arXiv Detail & Related papers (2023-02-06T16:54:20Z)
Mixed-Precision Inference Quantization: Radically Towards Faster inference speed, Lower Storage requirement, and Lower Loss [4.877532217193618]
Existing quantization techniques rely heavily on experience and "fine-tuning" skills. This study provides a methodology for acquiring a mixed-precise quantization model with a lower loss than the full precision model. In particular, we will demonstrate that neural networks with massive identity mappings are resistant to the quantization method.
arXiv Detail & Related papers (2022-07-20T10:55:34Z)
ClusterQ: Semantic Feature Distribution Alignment for Data-Free Quantization [111.12063632743013]
We propose a new and effective data-free quantization method termed ClusterQ. To obtain high inter-class separability of semantic features, we cluster and align the feature distribution statistics. We also incorporate the intra-class variance to solve class-wise mode collapse.
arXiv Detail & Related papers (2022-04-30T06:58:56Z)
Sharpness-aware Quantization for Deep Neural Networks [45.150346855368]
Sharpness-Aware Quantization (SAQ) is a novel method to explore the effect of Sharpness-Aware Minimization (SAM) on model compression. We show that SAQ improves the generalization performance of the quantized models, yielding the SOTA results in uniform quantization.
arXiv Detail & Related papers (2021-11-24T05:16:41Z)
NeRF in detail: Learning to sample for view synthesis [104.75126790300735]
Neural radiance fields (NeRF) methods have demonstrated impressive novel view synthesis. In this work we address a clear limitation of the vanilla coarse-to-fine approach -- that it is based on a performance and not trained end-to-end for the task at hand. We introduce a differentiable module that learns to propose samples and their importance for the fine network, and consider and compare multiple alternatives for its neural architecture.
arXiv Detail & Related papers (2021-06-09T17:59:10Z)
Zero-shot Adversarial Quantization [11.722728148523366]
We propose a zero-shot adversarial quantization (ZAQ) framework, facilitating effective discrepancy estimation and knowledge transfer. This is achieved by a novel two-level discrepancy modeling to drive a generator to synthesize informative and diverse data examples. We conduct extensive experiments on three fundamental vision tasks, demonstrating the superiority of ZAQ over the strong zero-shot baselines.
arXiv Detail & Related papers (2021-03-29T01:33:34Z)
Generative Zero-shot Network Quantization [41.75769117366117]
Convolutional neural networks are able to learn realistic image priors from numerous training samples in low-level image generation and restoration. We show that, for high-level image recognition tasks, we can further reconstruct "realistic" images of each category by leveraging intrinsic Batch Normalization (BN) statistics without any training data.
arXiv Detail & Related papers (2021-01-21T04:10:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.