Related papers: HERO: Hessian-Enhanced Robust Optimization for Unifying and Improving Generalization and Quantization Performance

HERO: Hessian-Enhanced Robust Optimization for Unifying and Improving Generalization and Quantization Performance

URL: http://arxiv.org/abs/2111.11986v1
Date: Tue, 23 Nov 2021 16:32:58 GMT
Title: HERO: Hessian-Enhanced Robust Optimization for Unifying and Improving Generalization and Quantization Performance
Authors: Huanrui Yang, Xiaoxuan Yang, Neil Zhenqiang Gong and Yiran Chen
Abstract summary: We propose HERO, a Hessian-enhanced robust optimization method, to minimize the Hessian eigenvalues through a gradient-based training process. HERO enables up to a 3.8% gain on test accuracy, up to 30% higher accuracy under 80% training label perturbation, and the best post-training quantization accuracy across a wide range of precision.
Score: 43.478851400266926
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the recent demand of deploying neural network models on mobile and edge devices, it is desired to improve the model's generalizability on unseen testing data, as well as enhance the model's robustness under fixed-point quantization for efficient deployment. Minimizing the training loss, however, provides few guarantees on the generalization and quantization performance. In this work, we fulfill the need of improving generalization and quantization performance simultaneously by theoretically unifying them under the framework of improving the model's robustness against bounded weight perturbation and minimizing the eigenvalues of the Hessian matrix with respect to model weights. We therefore propose HERO, a Hessian-enhanced robust optimization method, to minimize the Hessian eigenvalues through a gradient-based training process, simultaneously improving the generalization and quantization performance. HERO enables up to a 3.8% gain on test accuracy, up to 30% higher accuracy under 80% training label perturbation, and the best post-training quantization accuracy across a wide range of precision, including a >10% accuracy improvement over SGD-trained models for common model architectures on various datasets.

Related papers

Optimizing Deep Neural Networks using Safety-Guided Self Compression [0.0]
This study introduces a novel safety-driven quantization framework that prunes and quantizes neural network weights. The proposed methodology is rigorously evaluated on both a convolutional neural network (CNN) and an attention-based language model. Experimental results reveal that our framework achieves up to a 2.5% enhancement in test accuracy relative to the original unquantized models.
arXiv Detail & Related papers (2025-05-01T06:50:30Z)
Improving Quantization with Post-Training Model Expansion [0.35377121774178694]
Post-training model expansion is a viable strategy to improve model quality within a quantization co-design space. We show it is possible to progressively and selectively expand the size of a pre-trained large language model (LLM) to improve model quality without end-to-end retraining.
arXiv Detail & Related papers (2025-03-21T19:56:59Z)
QPruner: Probabilistic Decision Quantization for Structured Pruning in Large Language Models [3.093903491123962]
Large language models (LLMs) have significantly advanced various natural language processing (NLP) tasks. structured pruning is an effective approach to reducing model size, but it often results in significant accuracy degradation. We introduce quantization into the structured pruning framework to reduce memory consumption during both fine-tuning and inference. We propose QPruner, a novel framework that employs structured pruning to reduce model size, followed by a layer-wise mixed-precision quantization scheme.
arXiv Detail & Related papers (2024-12-16T10:14:01Z)
GAQAT: gradient-adaptive quantization-aware training for domain generalization [54.31450550793485]
We propose a novel Gradient-Adaptive Quantization-Aware Training (GAQAT) framework for DG. Our approach begins by identifying the scale-gradient conflict problem in low-precision quantization. Extensive experiments validate the effectiveness of the proposed GAQAT framework.
arXiv Detail & Related papers (2024-12-07T06:07:21Z)
QT-DoG: Quantization-aware Training for Domain Generalization [58.439816306817306]
We propose Quantization-aware Training for Domain Generalization (QT-DoG) QT-DoG exploits quantization as an implicit regularizer by inducing noise in model weights. We demonstrate that QT-DoG generalizes across various datasets, architectures, and quantization algorithms.
arXiv Detail & Related papers (2024-10-08T13:21:48Z)
TRAWL: Tensor Reduced and Approximated Weights for Large Language Models [11.064868044313855]
We introduce TRAWL (Tensor Reduced and Approximated Weights for Large Language Models), a technique that applies tensor decomposition across multiple weight matrices to effectively denoise LLMs by capturing global structural patterns. Our experiments show that TRAWL improves model performance by up to 16% over baseline models on benchmark datasets, without requiring additional data, training, or fine-tuning.
arXiv Detail & Related papers (2024-06-25T04:01:32Z)
Clipped Uniform Quantizers for Communication-Efficient Federated Learning [3.38220960870904]
This paper introduces an approach to employ clipped uniform quantization in federated learning settings. By employing optimal clipping thresholds and adaptive quantization schemes, our method significantly curtails the bit requirements for model weight transmissions.
arXiv Detail & Related papers (2024-05-22T05:48:25Z)
Meta-learning to Calibrate Gaussian Processes with Deep Kernels for Regression Uncertainty Estimation [43.23399636191726]
We propose a meta-learning method for calibrating deep kernel GPs for improving regression uncertainty estimation performance. The proposed method meta-learns how to calibrate uncertainty using data from various tasks by minimizing the test expected calibration error. Our experiments demonstrate that the proposed method improves uncertainty estimation performance while keeping high regression performance.
arXiv Detail & Related papers (2023-12-13T07:58:47Z)
Zero-Shot Sharpness-Aware Quantization for Pre-trained Language Models [88.80146574509195]
Quantization is a promising approach for reducing memory overhead and accelerating inference. We propose a novel-aware quantization (ZSAQ) framework for the zero-shot quantization of various PLMs.
arXiv Detail & Related papers (2023-10-20T07:09:56Z)
On-Chip Hardware-Aware Quantization for Mixed Precision Neural Networks [52.97107229149988]
We propose an On-Chip Hardware-Aware Quantization framework, performing hardware-aware mixed-precision quantization on deployed edge devices. For efficiency metrics, we built an On-Chip Quantization Aware pipeline, which allows the quantization process to perceive the actual hardware efficiency of the quantization operator. For accuracy metrics, we propose Mask-Guided Quantization Estimation technology to effectively estimate the accuracy impact of operators in the on-chip scenario.
arXiv Detail & Related papers (2023-09-05T04:39:34Z)
Precision-Recall Divergence Optimization for Generative Modeling with GANs and Normalizing Flows [54.050498411883495]
We develop a novel training method for generative models, such as Generative Adversarial Networks and Normalizing Flows. We show that achieving a specified precision-recall trade-off corresponds to minimizing a unique $f$-divergence from a family we call the textitPR-divergences. Our approach improves the performance of existing state-of-the-art models like BigGAN in terms of either precision or recall when tested on datasets such as ImageNet.
arXiv Detail & Related papers (2023-05-30T10:07:17Z)
Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose. We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.