PARQ: Piecewise-Affine Regularized Quantization
- URL: http://arxiv.org/abs/2503.15748v1
- Date: Wed, 19 Mar 2025 23:38:49 GMT
- Title: PARQ: Piecewise-Affine Regularized Quantization
- Authors: Lisa Jin, Jianhao Ma, Zechun Liu, Andrey Gromov, Aaron Defazio, Lin Xiao,
- Abstract summary: We show that convex, piecewise-affine regularization (PAR) can effectively induce parameters to cluster towards discrete values.<n>We minimize PARregularized loss functions using an aggregate gradient method (AProx) and prove that it has last-rate convergence.
- Score: 27.797664437344768
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We develop a principled method for quantization-aware training (QAT) of large-scale machine learning models. Specifically, we show that convex, piecewise-affine regularization (PAR) can effectively induce the model parameters to cluster towards discrete values. We minimize PAR-regularized loss functions using an aggregate proximal stochastic gradient method (AProx) and prove that it has last-iterate convergence. Our approach provides an interpretation of the straight-through estimator (STE), a widely used heuristic for QAT, as the asymptotic form of PARQ. We conduct experiments to demonstrate that PARQ obtains competitive performance on convolution- and transformer-based vision tasks.
Related papers
- FIMA-Q: Post-Training Quantization for Vision Transformers by Fisher Information Matrix Approximation [55.12070409045766]
Post-training quantization (PTQ) has stood out as a cost-effective and promising model compression paradigm in recent years.<n>Current PTQ methods for Vision Transformers (ViTs) still suffer from significant accuracy degradation, especially under low-bit quantization.
arXiv Detail & Related papers (2025-06-13T07:57:38Z) - Stabilizing Quantization-Aware Training by Implicit-Regularization on Hessian Matrix [0.7261171488281837]
We find that the sharp landscape of loss, which leads to a dramatic performance drop, is an essential factor that causes instability.<n>We propose Feature-Perturbed Quantization (FPQ) to generalize and employ the feature distillation method to the quantized model.
arXiv Detail & Related papers (2025-03-14T07:56:20Z) - Feynman-Kac Correctors in Diffusion: Annealing, Guidance, and Product of Experts [64.34482582690927]
We provide an efficient and principled method for sampling from a sequence of annealed, geometric-averaged, or product distributions derived from pretrained score-based models.<n>We propose Sequential Monte Carlo (SMC) resampling algorithms that leverage inference-time scaling to improve sampling quality.
arXiv Detail & Related papers (2025-03-04T17:46:51Z) - RoSTE: An Efficient Quantization-Aware Supervised Fine-Tuning Approach for Large Language Models [95.32315448601241]
We propose an algorithm named Rotated Straight-Through-Estimator (RoSTE)<n>RoSTE combines quantization-aware supervised fine-tuning (QA-SFT) with an adaptive rotation strategy to reduce activation outliers.<n>Our findings reveal that the prediction error is directly proportional to the quantization error of the converged weights, which can be effectively managed through an optimized rotation configuration.
arXiv Detail & Related papers (2025-02-13T06:44:33Z) - Efficient Quantum Gradient and Higher-order Derivative Estimation via Generalized Hadamard Test [2.5545813981422882]
Gradient-based methods are crucial for understanding the behavior of parameterized quantum circuits (PQCs)
Existing gradient estimation methods, such as Finite Difference, Shift Rule, Hadamard Test, and Direct Hadamard Test, often yield suboptimal gradient circuits for certain PQCs.
We introduce the Flexible Hadamard Test, which, when applied to first-order gradient estimation methods, can invert the roles of ansatz generators and observables.
We also introduce Quantum Automatic Differentiation (QAD), a unified gradient method that adaptively selects the best gradient estimation technique for individual parameters within a PQ
arXiv Detail & Related papers (2024-08-10T02:08:54Z) - Functional Partial Least-Squares: Adaptive Estimation and Inference [0.0]
We show that the functional partial least squares (PLS) estimator attains nearly minimax-optimal convergence rates over a class of ellipsoids.<n>We apply our methodology to evaluate the nonlinear effects of temperature on corn and soybean yields.
arXiv Detail & Related papers (2024-02-16T23:47:47Z) - Model-Based Reparameterization Policy Gradient Methods: Theory and
Practical Algorithms [88.74308282658133]
Reization (RP) Policy Gradient Methods (PGMs) have been widely adopted for continuous control tasks in robotics and computer graphics.
Recent studies have revealed that, when applied to long-term reinforcement learning problems, model-based RP PGMs may experience chaotic and non-smooth optimization landscapes.
We propose a spectral normalization method to mitigate the exploding variance issue caused by long model unrolls.
arXiv Detail & Related papers (2023-10-30T18:43:21Z) - A kernel-based quantum random forest for improved classification [0.0]
Quantum Machine Learning (QML) to enhance traditional classical learning methods has seen various limitations to its realisation.
We extend the linear quantum support vector machine (QSVM) with kernel function computed through quantum kernel estimation (QKE)
To limit overfitting, we further extend the model to employ a low-rank Nystr"om approximation to the kernel matrix.
arXiv Detail & Related papers (2022-10-05T15:57:31Z) - A Convergence Theory for Over-parameterized Variational Quantum
Eigensolvers [21.72347971869391]
The Variational Quantum Eigensolver (VQE) is a promising candidate for quantum applications on near-term Noisy Intermediate-Scale Quantum (NISQ) computers.
We provide the first rigorous analysis of the convergence of VQEs in the over- parameterization regime.
arXiv Detail & Related papers (2022-05-25T04:06:50Z) - Counting Phases and Faces Using Bayesian Thermodynamic Integration [77.34726150561087]
We introduce a new approach to reconstruction of the thermodynamic functions and phase boundaries in two-parametric statistical mechanics systems.
We use the proposed approach to accurately reconstruct the partition functions and phase diagrams of the Ising model and the exactly solvable non-equilibrium TASEP.
arXiv Detail & Related papers (2022-05-18T17:11:23Z) - Efficient CDF Approximations for Normalizing Flows [64.60846767084877]
We build upon the diffeomorphic properties of normalizing flows to estimate the cumulative distribution function (CDF) over a closed region.
Our experiments on popular flow architectures and UCI datasets show a marked improvement in sample efficiency as compared to traditional estimators.
arXiv Detail & Related papers (2022-02-23T06:11:49Z) - Scalable Variational Gaussian Processes via Harmonic Kernel
Decomposition [54.07797071198249]
We introduce a new scalable variational Gaussian process approximation which provides a high fidelity approximation while retaining general applicability.
We demonstrate that, on a range of regression and classification problems, our approach can exploit input space symmetries such as translations and reflections.
Notably, our approach achieves state-of-the-art results on CIFAR-10 among pure GP models.
arXiv Detail & Related papers (2021-06-10T18:17:57Z) - Online Statistical Inference for Stochastic Optimization via
Kiefer-Wolfowitz Methods [8.890430804063705]
We first present the distribution for the Polyak-Ruppert-averaging type Kiefer-Wolfowitz (AKW) estimators.
The distributional result reflects the trade-off between statistical efficiency and function query complexity.
arXiv Detail & Related papers (2021-02-05T19:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.