Fine-grained Data Distribution Alignment for Post-Training Quantization
- URL: http://arxiv.org/abs/2109.04186v1
- Date: Thu, 9 Sep 2021 11:45:52 GMT
- Title: Fine-grained Data Distribution Alignment for Post-Training Quantization
- Authors: Yunshan Zhong, Mingbao Lin, Mengzhao Chen, Ke Li, Yunhang Shen, Fei
Chao, Yongjian Wu, Feiyue Huang, Rongrong Ji
- Abstract summary: We propose a fine-grained data distribution alignment (FDDA) method to boost the performance of post-training quantization.
Our method shows the state-of-the-art performance on ImageNet, especially when the first and last layers are quantized to low-bit.
- Score: 100.82928284439271
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While post-training quantization receives popularity mostly due to its
evasion in accessing the original complete training dataset, its poor
performance also stems from this limitation. To alleviate this limitation, in
this paper, we leverage the synthetic data introduced by zero-shot quantization
with calibration dataset and we propose a fine-grained data distribution
alignment (FDDA) method to boost the performance of post-training quantization.
The method is based on two important properties of batch normalization
statistics (BNS) we observed in deep layers of the trained network, i.e.,
inter-class separation and intra-class incohesion. To preserve this
fine-grained distribution information: 1) We calculate the per-class BNS of the
calibration dataset as the BNS centers of each class and propose a
BNS-centralized loss to force the synthetic data distributions of different
classes to be close to their own centers. 2) We add Gaussian noise into the
centers to imitate the incohesion and propose a BNS-distorted loss to force the
synthetic data distribution of the same class to be close to the distorted
centers. By introducing these two fine-grained losses, our method shows the
state-of-the-art performance on ImageNet, especially when the first and last
layers are quantized to low-bit as well. Our project is available at
https://github.com/viperit/FDDA.
Related papers
- Double-Bounded Optimal Transport for Advanced Clustering and
Classification [58.237576976486544]
We propose Doubly Bounded Optimal Transport (DB-OT), which assumes that the target distribution is restricted within two boundaries instead of a fixed one.
We show that our method can achieve good results with our improved inference scheme in the testing stage.
arXiv Detail & Related papers (2024-01-21T07:43:01Z) - SEMI-CenterNet: A Machine Learning Facilitated Approach for
Semiconductor Defect Inspection [0.10555513406636088]
We have proposed SEMI-CenterNet (SEMI-CN), a customized CN architecture trained on SEM images of semiconductor wafer defects.
SEMI-CN gets trained to output the center, class, size, and offset of a defect instance.
We train SEMI-CN on two datasets and benchmark two ResNet backbones for the framework.
arXiv Detail & Related papers (2023-08-14T14:39:06Z) - Proposal Distribution Calibration for Few-Shot Object Detection [65.19808035019031]
In few-shot object detection (FSOD), the two-step training paradigm is widely adopted to mitigate the severe sample imbalance.
Unfortunately, the extreme data scarcity aggravates the proposal distribution bias, hindering the RoI head from evolving toward novel classes.
We introduce a simple yet effective proposal distribution calibration (PDC) approach to neatly enhance the localization and classification abilities of the RoI head.
arXiv Detail & Related papers (2022-12-15T05:09:11Z) - ClusterQ: Semantic Feature Distribution Alignment for Data-Free
Quantization [111.12063632743013]
We propose a new and effective data-free quantization method termed ClusterQ.
To obtain high inter-class separability of semantic features, we cluster and align the feature distribution statistics.
We also incorporate the intra-class variance to solve class-wise mode collapse.
arXiv Detail & Related papers (2022-04-30T06:58:56Z) - Delving into Sample Loss Curve to Embrace Noisy and Imbalanced Data [17.7825114228313]
Corrupted labels and class imbalance are commonly encountered in practically collected training data.
Existing approaches alleviate these issues by adopting a sample re-weighting strategy.
However, biased samples with corrupted labels and of tailed classes commonly co-exist in training data.
arXiv Detail & Related papers (2021-12-30T09:20:07Z) - Provable Generalization of SGD-trained Neural Networks of Any Width in
the Presence of Adversarial Label Noise [85.59576523297568]
We consider a one-hidden-layer leaky ReLU network of arbitrary width trained by gradient descent.
We prove that SGD produces neural networks that have classification accuracy competitive with that of the best halfspace over the distribution.
arXiv Detail & Related papers (2021-01-04T18:32:49Z) - Stochastic Batch Augmentation with An Effective Distilled Dynamic Soft
Label Regularizer [11.153892464618545]
We propose a framework called Batch Augmentation safety of generalization (SBA) to address these problems.
SBA decides whether to augment at iterations controlled by the batch scheduler and in which a ''distilled'' dynamic soft regularization is introduced.
Our experiments on CIFAR-10, CIFAR-100, and ImageNet show that SBA can improve the generalization of the neural networks and speed up the convergence of network training.
arXiv Detail & Related papers (2020-06-27T04:46:39Z) - Passive Batch Injection Training Technique: Boosting Network Performance
by Injecting Mini-Batches from a different Data Distribution [39.8046809855363]
This work presents a novel training technique for deep neural networks that makes use of additional data from a distribution that is different from that of the original input data.
To the best of our knowledge, this is the first work that makes use of different data distribution to aid the training of convolutional neural networks (CNNs)
arXiv Detail & Related papers (2020-06-08T08:17:32Z) - Deep Active Learning for Biased Datasets via Fisher Kernel
Self-Supervision [5.352699766206807]
Active learning (AL) aims to minimize labeling efforts for data-demanding deep neural networks (DNNs)
We propose a low-complexity method for feature density matching using self-supervised Fisher kernel (FK)
Our method outperforms state-of-the-art methods on MNIST, SVHN, and ImageNet classification while requiring only 1/10th of processing.
arXiv Detail & Related papers (2020-03-01T03:56:32Z) - Generalized ODIN: Detecting Out-of-distribution Image without Learning
from Out-of-distribution Data [87.61504710345528]
We propose two strategies for freeing a neural network from tuning with OoD data, while improving its OoD detection performance.
We specifically propose to decompose confidence scoring as well as a modified input pre-processing method.
Our further analysis on a larger scale image dataset shows that the two types of distribution shifts, specifically semantic shift and non-semantic shift, present a significant difference.
arXiv Detail & Related papers (2020-02-26T04:18:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.