Synergistic Self-supervised and Quantization Learning
- URL: http://arxiv.org/abs/2207.05432v1
- Date: Tue, 12 Jul 2022 09:55:10 GMT
- Title: Synergistic Self-supervised and Quantization Learning
- Authors: Yun-Hao Cao, Peiqin Sun, Yechang Huang, Jianxin Wu, Shuchang Zhou
- Abstract summary: We propose a method called synergistic self-supervised and quantization learning (S) to pretrain quantization-friendly self-supervised models.
By only training once, S can then benefit various downstream tasks at different bit-widths simultaneously.
- Score: 24.382347077407303
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the success of self-supervised learning (SSL), it has become a
mainstream paradigm to fine-tune from self-supervised pretrained models to
boost the performance on downstream tasks. However, we find that current SSL
models suffer severe accuracy drops when performing low-bit quantization,
prohibiting their deployment in resource-constrained applications. In this
paper, we propose a method called synergistic self-supervised and quantization
learning (SSQL) to pretrain quantization-friendly self-supervised models
facilitating downstream deployment. SSQL contrasts the features of the
quantized and full precision models in a self-supervised fashion, where the
bit-width for the quantized model is randomly selected in each step. SSQL not
only significantly improves the accuracy when quantized to lower bit-widths,
but also boosts the accuracy of full precision models in most cases. By only
training once, SSQL can then benefit various downstream tasks at different
bit-widths simultaneously. Moreover, the bit-width flexibility is achieved
without additional storage overhead, requiring only one copy of weights during
training and inference. We theoretically analyze the optimization process of
SSQL, and conduct exhaustive experiments on various benchmarks to further
demonstrate the effectiveness of our method. Our code is available at
https://github.com/megvii-research/SSQL-ECCV2022.
Related papers
- Nearly Lossless Adaptive Bit Switching [8.485009775430411]
Experimental results on the ImageNet-1K classification demonstrate that our methods have enough advantages to state-of-the-art one-shot joint QAT in both multi-precision and mixed-precision.
arXiv Detail & Related papers (2025-02-03T09:46:26Z) - Test-Time Alignment via Hypothesis Reweighting [56.71167047381817]
Large pretrained models often struggle with underspecified tasks.
We propose a novel framework to address the challenge of aligning models to test-time user intent.
arXiv Detail & Related papers (2024-12-11T23:02:26Z) - A Closer Look at Benchmarking Self-Supervised Pre-training with Image Classification [51.35500308126506]
Self-supervised learning (SSL) is a machine learning approach where the data itself provides supervision, eliminating the need for external labels.
We study how classification-based evaluation protocols for SSL correlate and how well they predict downstream performance on different dataset types.
arXiv Detail & Related papers (2024-07-16T23:17:36Z) - Uncertainty-aware Parameter-Efficient Self-training for Semi-supervised
Language Understanding [38.11411155621616]
We study self-training as one of the predominant semi-supervised learning approaches.
We present UPET, a novel Uncertainty-aware self-Training framework.
We show that UPET achieves a substantial improvement in terms of performance and efficiency.
arXiv Detail & Related papers (2023-10-19T02:18:29Z) - Single-Stage Visual Relationship Learning using Conditional Queries [60.90880759475021]
TraCQ is a new formulation for scene graph generation that avoids the multi-task learning problem and the entity pair distribution.
We employ a DETR-based encoder-decoder conditional queries to significantly reduce the entity label space as well.
Experimental results show that TraCQ not only outperforms existing single-stage scene graph generation methods, it also beats many state-of-the-art two-stage methods on the Visual Genome dataset.
arXiv Detail & Related papers (2023-06-09T06:02:01Z) - Can recurrent neural networks learn process model structure? [0.2580765958706854]
We introduce an evaluation framework that combines variant-based resampling and custom metrics for fitness, precision and generalization.
We confirm that LSTMs can struggle to learn process model structure, even with simplistic process data.
We also found that decreasing the amount of information seen by the LSTM during training, causes a sharp drop in generalization and precision scores.
arXiv Detail & Related papers (2022-12-13T08:40:01Z) - A Semiparametric Efficient Approach To Label Shift Estimation and
Quantification [0.0]
We present a new procedure called SELSE which estimates the shift in the response variable's distribution.
We prove that SELSE's normalized error has the smallest possible variance matrix compared to any other algorithm in that family.
arXiv Detail & Related papers (2022-11-07T07:49:29Z) - DATA: Domain-Aware and Task-Aware Pre-training [94.62676913928831]
We present DATA, a simple yet effective NAS approach specialized for self-supervised learning (SSL)
Our method achieves promising results across a wide range of computation costs on downstream tasks, including image classification, object detection and semantic segmentation.
arXiv Detail & Related papers (2022-03-17T02:38:49Z) - LCS: Learning Compressible Subspaces for Adaptive Network Compression at
Inference Time [57.52251547365967]
We propose a method for training a "compressible subspace" of neural networks that contains a fine-grained spectrum of models.
We present results for achieving arbitrarily fine-grained accuracy-efficiency trade-offs at inference time for structured and unstructured sparsity.
Our algorithm extends to quantization at variable bit widths, achieving accuracy on par with individually trained networks.
arXiv Detail & Related papers (2021-10-08T17:03:34Z) - KDLSQ-BERT: A Quantized Bert Combining Knowledge Distillation with
Learned Step Size Quantization [1.9786767260073905]
transformer-based language models such as BERT have shown tremendous performance improvement for a range of natural language processing tasks.
We propose a novel quantization method named KDLSQ-BERT that combines knowledge distillation (KD) with learned step size quantization (LSQ) for language model quantization.
arXiv Detail & Related papers (2021-01-15T02:21:28Z) - The Right Tool for the Job: Matching Model and Instance Complexities [62.95183777679024]
As NLP models become larger, executing a trained model requires significant computational resources incurring monetary and environmental costs.
We propose a modification to contextual representation fine-tuning which, during inference, allows for an early (and fast) "exit"
We test our proposed modification on five different datasets in two tasks: three text classification datasets and two natural language inference benchmarks.
arXiv Detail & Related papers (2020-04-16T04:28:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.