QT-DoG: Quantization-aware Training for Domain Generalization
- URL: http://arxiv.org/abs/2410.06020v1
- Date: Tue, 8 Oct 2024 13:21:48 GMT
- Title: QT-DoG: Quantization-aware Training for Domain Generalization
- Authors: Saqib Javed, Hieu Le, Mathieu Salzmann,
- Abstract summary: We propose Quantization-aware Training for Domain Generalization (QT-DoG)
QT-DoG exploits quantization as an implicit regularizer by inducing noise in model weights.
We demonstrate that QT-DoG generalizes across various datasets, architectures, and quantization algorithms.
- Score: 58.439816306817306
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Domain Generalization (DG) aims to train models that perform well not only on the training (source) domains but also on novel, unseen target data distributions. A key challenge in DG is preventing overfitting to source domains, which can be mitigated by finding flatter minima in the loss landscape. In this work, we propose Quantization-aware Training for Domain Generalization (QT-DoG) and demonstrate that weight quantization effectively leads to flatter minima in the loss landscape, thereby enhancing domain generalization. Unlike traditional quantization methods focused on model compression, QT-DoG exploits quantization as an implicit regularizer by inducing noise in model weights, guiding the optimization process toward flatter minima that are less sensitive to perturbations and overfitting. We provide both theoretical insights and empirical evidence demonstrating that quantization inherently encourages flatter minima, leading to better generalization across domains. Moreover, with the benefit of reducing the model size through quantization, we demonstrate that an ensemble of multiple quantized models further yields superior accuracy than the state-of-the-art DG approaches with no computational or memory overheads. Our extensive experiments demonstrate that QT-DoG generalizes across various datasets, architectures, and quantization algorithms, and can be combined with other DG methods, establishing its versatility and robustness.
Related papers
- HG-Adapter: Improving Pre-Trained Heterogeneous Graph Neural Networks with Dual Adapters [53.97380482341493]
"pre-train, prompt-tuning" has demonstrated impressive performance for tuning pre-trained heterogeneous graph neural networks (HGNNs)
We propose a unified framework that combines two new adapters with potential labeled data extension to improve the generalization of pre-trained HGNN models.
arXiv Detail & Related papers (2024-11-02T06:43:54Z) - A PAC-Bayesian Perspective on the Interpolating Information Criterion [54.548058449535155]
We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime.
We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
arXiv Detail & Related papers (2023-11-13T01:48:08Z) - Zero-Shot Sharpness-Aware Quantization for Pre-trained Language Models [88.80146574509195]
Quantization is a promising approach for reducing memory overhead and accelerating inference.
We propose a novel-aware quantization (ZSAQ) framework for the zero-shot quantization of various PLMs.
arXiv Detail & Related papers (2023-10-20T07:09:56Z) - Learning Gradient-based Mixup towards Flatter Minima for Domain
Generalization [44.04047359057987]
We develop a new domain generalization algorithm named Flatness-aware Gradient-based Mixup (FGMix)
FGMix learns the similarity function towards flatter minima for better generalization.
On the DomainBed benchmark, we validate the efficacy of various designs of FGMix and demonstrate its superiority over other DG algorithms.
arXiv Detail & Related papers (2022-09-29T13:01:14Z) - On Balancing Bias and Variance in Unsupervised Multi-Source-Free Domain
Adaptation [6.2200089460762085]
Methods for multi-source-free domain adaptation (MSFDA) typically train a target model using pseudo-labeled data produced by the source models.
We develop an information-theoretic bound on the generalization error of the resulting target model.
We then provide insights on how to balance this trade-off from three perspectives, including domain aggregation, selective pseudo-labeling, and joint feature alignment.
arXiv Detail & Related papers (2022-02-01T22:34:18Z) - Towards Principled Disentanglement for Domain Generalization [90.9891372499545]
A fundamental challenge for machine learning models is generalizing to out-of-distribution (OOD) data.
We first formalize the OOD generalization problem as constrained optimization, called Disentanglement-constrained Domain Generalization (DDG)
Based on the transformation, we propose a primal-dual algorithm for joint representation disentanglement and domain generalization.
arXiv Detail & Related papers (2021-11-27T07:36:32Z) - Zero-shot Adversarial Quantization [11.722728148523366]
We propose a zero-shot adversarial quantization (ZAQ) framework, facilitating effective discrepancy estimation and knowledge transfer.
This is achieved by a novel two-level discrepancy modeling to drive a generator to synthesize informative and diverse data examples.
We conduct extensive experiments on three fundamental vision tasks, demonstrating the superiority of ZAQ over the strong zero-shot baselines.
arXiv Detail & Related papers (2021-03-29T01:33:34Z) - Gradient $\ell_1$ Regularization for Quantization Robustness [70.39776106458858]
We derive a simple regularization scheme that improves robustness against post-training quantization.
By training quantization-ready networks, our approach enables storing a single set of weights that can be quantized on-demand to different bit-widths.
arXiv Detail & Related papers (2020-02-18T12:31:34Z) - SQWA: Stochastic Quantized Weight Averaging for Improving the
Generalization Capability of Low-Precision Deep Neural Networks [29.187848543158992]
We present a new quantized neural network optimization approach, quantized weight averaging (SQWA)
The proposed approach includes floating-point model training, direct quantization of weights, capturing multiple low-precision models, averaging the captured models, and fine-tuning it with low-learning rates.
With SQWA training, we achieved state-of-the-art results for 2-bit QDNNs on CIFAR-100 and ImageNet datasets.
arXiv Detail & Related papers (2020-02-02T07:02:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.