Related papers: ZOQO: Zero-Order Quantized Optimization

Related papers

Efficient quantum machine learning with inverse-probability algebraic corrections [2.7412662946127764]
Quantum neural networks (QNNs) provide expressive probabilistic models by leveraging quantum superposition and entanglement.<n>Quantum neural networks (QNNs) provide expressive probabilistic models by leveraging quantum superposition and entanglement.<n>Existing training approaches largely rely on gradient-based procedural optimization.
arXiv Detail & Related papers (2026-01-23T11:28:53Z)
Mixed-Precision Quantization for Language Models: Techniques and Prospects [10.345914140081925]
Quantization has emerged as an essential compression technique to reduce model size, alleviate memory bottlenecks, and accelerate inference.<n>Mixed-precision quantization offers a promising alternative by selectively allocating precision across layers or within tensors to balance efficiency and accuracy.
arXiv Detail & Related papers (2025-10-19T12:16:40Z)
Enhancing Ultra-Low-Bit Quantization of Large Language Models Through Saliency-Aware Partial Retraining [0.0]
Post-training quantization reduces model size efficiently at the cost of decreased accuracy. quantization-aware training better preserves accuracy but is resource-intensive. We propose an ultra-low-bit quantization method that builds upon ApiQ and extends its performance without the need for full retraining.
arXiv Detail & Related papers (2025-04-14T19:31:21Z)
A quantum gradient descent algorithm for optimizing Gaussian Process models [28.16587217223671]
We propose a quantum gradient descent algorithm to optimize the Gaussian Process model. Our algorithm achieves exponential speedup in computing the gradients of the log marginal likelihood.
arXiv Detail & Related papers (2025-03-22T14:14:31Z)
Optimizing ML Training with Metagradient Descent [69.89631748402377]
We introduce an algorithm for efficiently calculating metagradients -- gradients through model training -- at scale. We then introduce a "smooth model training" framework that enables effective optimization using metagradients.
arXiv Detail & Related papers (2025-03-17T22:18:24Z)
Improving Quantization-aware Training of Low-Precision Network via Block Replacement on Full-Precision Counterpart [1.5508907979229383]
Quantization-aware training (QAT) is a common paradigm for network quantization. The low-precision model exhibits limited representation capabilities and cannot directly replicate full-precision calculations. We propose a general QAT framework for alleviating the concerns by permitting the forward and backward processes of the low-precision network to be guided by the full-precision partner.
arXiv Detail & Related papers (2024-12-20T12:38:18Z)
Quantized and Interpretable Learning Scheme for Deep Neural Networks in Classification Task [0.0]
We introduce an approach that combines saliency-guided training with quantization techniques to create an interpretable and resource-efficient model.<n>Our results demonstrate that the combined use of saliency-guided training and PACT-based quantization not only maintains classification performance but also produces models that are significantly more efficient and interpretable.
arXiv Detail & Related papers (2024-12-05T06:34:06Z)
LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit [55.73370804397226]
Quantization, a key compression technique, can effectively mitigate these demands by compressing and accelerating large language models. We present LLMC, a plug-and-play compression toolkit, to fairly and systematically explore the impact of quantization. Powered by this versatile toolkit, our benchmark covers three key aspects: calibration data, algorithms (three strategies), and data formats.
arXiv Detail & Related papers (2024-05-09T11:49:05Z)
End-to-End Learning for Fair Multiobjective Optimization Under Uncertainty [55.04219793298687]
The Predict-Then-Forecast (PtO) paradigm in machine learning aims to maximize downstream decision quality. This paper extends the PtO methodology to optimization problems with nondifferentiable Ordered Weighted Averaging (OWA) objectives. It shows how optimization of OWA functions can be effectively integrated with parametric prediction for fair and robust optimization under uncertainty.
arXiv Detail & Related papers (2024-02-12T16:33:35Z)
Multi-fidelity Constrained Optimization for Stochastic Black Box Simulators [1.6385815610837167]
We introduce the algorithm Scout-Nd (Stochastic Constrained Optimization for N dimensions) to tackle the issues mentioned earlier. Scout-Nd efficiently estimates the gradient, reduces the noise of the estimator gradient, and applies multi-fidelity schemes to further reduce computational effort. We validate our approach on standard benchmarks, demonstrating its effectiveness in optimizing parameters highlighting better performance compared to existing methods.
arXiv Detail & Related papers (2023-11-25T23:36:38Z)
Zero-Shot Sharpness-Aware Quantization for Pre-trained Language Models [88.80146574509195]
Quantization is a promising approach for reducing memory overhead and accelerating inference. We propose a novel-aware quantization (ZSAQ) framework for the zero-shot quantization of various PLMs.
arXiv Detail & Related papers (2023-10-20T07:09:56Z)
DeepZero: Scaling up Zeroth-Order Optimization for Deep Model Training [33.11416096294998]
Zeroth-order (ZO) optimization has become a popular technique for solving machine learning (ML) problems. No prior work has demonstrated the effectiveness of ZO optimization in training deep neural networks (DNNs) without a significant decrease in performance. We develop DeepZero, a principled ZO deep learning (DL) framework that can scale ZO optimization to DNN training from scratch.
arXiv Detail & Related papers (2023-10-03T13:05:36Z)
QuantEase: Optimization-based Quantization for Language Models [17.333778751252392]
This work introduces Quantization (PTQ) of various quantization layers from recent advances of Large Language Models (LLMs) Our CD-based approach features straightforward updates, relying solely on vector operations. We also explore an outlier approach, allowing for retaining significant weights (outoutliers) with complete precision.
arXiv Detail & Related papers (2023-09-05T01:39:09Z)
PreQuant: A Task-agnostic Quantization Approach for Pre-trained Language Models [52.09865918265002]
We propose a novel quantize before fine-tuning'' framework, PreQuant. PreQuant is compatible with various quantization strategies, with outlier-aware fine-tuning incorporated to correct the induced quantization error. We demonstrate the effectiveness of PreQuant on the GLUE benchmark using BERT, RoBERTa, and T5.
arXiv Detail & Related papers (2023-05-30T08:41:33Z)
Towards Compute-Optimal Transfer Learning [82.88829463290041]
We argue that zero-shot structured pruning of pretrained models allows them to increase compute efficiency with minimal reduction in performance. Our results show that pruning convolutional filters of pretrained models can lead to more than 20% performance improvement in low computational regimes.
arXiv Detail & Related papers (2023-04-25T21:49:09Z)
Neural Networks with Quantization Constraints [111.42313650830248]
We present a constrained learning approach to quantization training. We show that the resulting problem is strongly dual and does away with gradient estimations. We demonstrate that the proposed approach exhibits competitive performance in image classification tasks.
arXiv Detail & Related papers (2022-10-27T17:12:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.