ZOQO: Zero-Order Quantized Optimization
- URL: http://arxiv.org/abs/2501.06736v1
- Date: Sun, 12 Jan 2025 07:15:55 GMT
- Title: ZOQO: Zero-Order Quantized Optimization
- Authors: Noga Bar, Raja Giryes,
- Abstract summary: We introduce a zero-order quantized optimization (ZOQO) method designed for training models with quantized parameters and operations.<n>Our approach leverages zero-order approximations of the gradient sign and adapts the learning process to maintain the parameters' quantization without the need for full-precision gradient calculations.
- Score: 31.43307762723943
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The increasing computational and memory demands in deep learning present significant challenges, especially in resource-constrained environments. We introduce a zero-order quantized optimization (ZOQO) method designed for training models with quantized parameters and operations. Our approach leverages zero-order approximations of the gradient sign and adapts the learning process to maintain the parameters' quantization without the need for full-precision gradient calculations. We demonstrate the effectiveness of ZOQO through experiments in fine-tuning of large language models and black-box adversarial attacks. Despite the limitations of zero-order and quantized operations training, our method achieves competitive performance compared to full-precision methods, highlighting its potential for low-resource environments.
Related papers
- Enhancing Ultra-Low-Bit Quantization of Large Language Models Through Saliency-Aware Partial Retraining [0.0]
Post-training quantization reduces model size efficiently at the cost of decreased accuracy.
quantization-aware training better preserves accuracy but is resource-intensive.
We propose an ultra-low-bit quantization method that builds upon ApiQ and extends its performance without the need for full retraining.
arXiv Detail & Related papers (2025-04-14T19:31:21Z) - A quantum gradient descent algorithm for optimizing Gaussian Process models [28.16587217223671]
We propose a quantum gradient descent algorithm to optimize the Gaussian Process model.
Our algorithm achieves exponential speedup in computing the gradients of the log marginal likelihood.
arXiv Detail & Related papers (2025-03-22T14:14:31Z) - Optimizing ML Training with Metagradient Descent [69.89631748402377]
We introduce an algorithm for efficiently calculating metagradients -- gradients through model training -- at scale.
We then introduce a "smooth model training" framework that enables effective optimization using metagradients.
arXiv Detail & Related papers (2025-03-17T22:18:24Z) - Improving Quantization-aware Training of Low-Precision Network via Block Replacement on Full-Precision Counterpart [1.5508907979229383]
Quantization-aware training (QAT) is a common paradigm for network quantization.
The low-precision model exhibits limited representation capabilities and cannot directly replicate full-precision calculations.
We propose a general QAT framework for alleviating the concerns by permitting the forward and backward processes of the low-precision network to be guided by the full-precision partner.
arXiv Detail & Related papers (2024-12-20T12:38:18Z) - Quantized and Interpretable Learning Scheme for Deep Neural Networks in Classification Task [0.0]
We introduce an approach that combines saliency-guided training with quantization techniques to create an interpretable and resource-efficient model.<n>Our results demonstrate that the combined use of saliency-guided training and PACT-based quantization not only maintains classification performance but also produces models that are significantly more efficient and interpretable.
arXiv Detail & Related papers (2024-12-05T06:34:06Z) - LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit [55.73370804397226]
Quantization, a key compression technique, can effectively mitigate these demands by compressing and accelerating large language models.
We present LLMC, a plug-and-play compression toolkit, to fairly and systematically explore the impact of quantization.
Powered by this versatile toolkit, our benchmark covers three key aspects: calibration data, algorithms (three strategies), and data formats.
arXiv Detail & Related papers (2024-05-09T11:49:05Z) - End-to-End Learning for Fair Multiobjective Optimization Under
Uncertainty [55.04219793298687]
The Predict-Then-Forecast (PtO) paradigm in machine learning aims to maximize downstream decision quality.
This paper extends the PtO methodology to optimization problems with nondifferentiable Ordered Weighted Averaging (OWA) objectives.
It shows how optimization of OWA functions can be effectively integrated with parametric prediction for fair and robust optimization under uncertainty.
arXiv Detail & Related papers (2024-02-12T16:33:35Z) - Zero-Shot Sharpness-Aware Quantization for Pre-trained Language Models [88.80146574509195]
Quantization is a promising approach for reducing memory overhead and accelerating inference.
We propose a novel-aware quantization (ZSAQ) framework for the zero-shot quantization of various PLMs.
arXiv Detail & Related papers (2023-10-20T07:09:56Z) - DeepZero: Scaling up Zeroth-Order Optimization for Deep Model Training [33.11416096294998]
Zeroth-order (ZO) optimization has become a popular technique for solving machine learning (ML) problems.
No prior work has demonstrated the effectiveness of ZO optimization in training deep neural networks (DNNs) without a significant decrease in performance.
We develop DeepZero, a principled ZO deep learning (DL) framework that can scale ZO optimization to DNN training from scratch.
arXiv Detail & Related papers (2023-10-03T13:05:36Z) - QuantEase: Optimization-based Quantization for Language Models [17.333778751252392]
This work introduces Quantization (PTQ) of various quantization layers from recent advances of Large Language Models (LLMs)
Our CD-based approach features straightforward updates, relying solely on vector operations.
We also explore an outlier approach, allowing for retaining significant weights (outoutliers) with complete precision.
arXiv Detail & Related papers (2023-09-05T01:39:09Z) - PreQuant: A Task-agnostic Quantization Approach for Pre-trained Language
Models [52.09865918265002]
We propose a novel quantize before fine-tuning'' framework, PreQuant.
PreQuant is compatible with various quantization strategies, with outlier-aware fine-tuning incorporated to correct the induced quantization error.
We demonstrate the effectiveness of PreQuant on the GLUE benchmark using BERT, RoBERTa, and T5.
arXiv Detail & Related papers (2023-05-30T08:41:33Z) - Towards Compute-Optimal Transfer Learning [82.88829463290041]
We argue that zero-shot structured pruning of pretrained models allows them to increase compute efficiency with minimal reduction in performance.
Our results show that pruning convolutional filters of pretrained models can lead to more than 20% performance improvement in low computational regimes.
arXiv Detail & Related papers (2023-04-25T21:49:09Z) - Neural Networks with Quantization Constraints [111.42313650830248]
We present a constrained learning approach to quantization training.
We show that the resulting problem is strongly dual and does away with gradient estimations.
We demonstrate that the proposed approach exhibits competitive performance in image classification tasks.
arXiv Detail & Related papers (2022-10-27T17:12:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.