Inertial Quadratic Majorization Minimization with Application to Kernel Regularized Learning
- URL: http://arxiv.org/abs/2507.04247v1
- Date: Sun, 06 Jul 2025 05:17:28 GMT
- Title: Inertial Quadratic Majorization Minimization with Application to Kernel Regularized Learning
- Authors: Qiang Heng, Caixing Wang,
- Abstract summary: We introduce the Quadratic Majorization Minimization with Extrapolation (QMME) framework and establish its sequential convergence properties.<n>To demonstrate practical advantages, we apply QMME to large-scale kernel regularized learning problems.
- Score: 1.0282274843007797
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: First-order methods in convex optimization offer low per-iteration cost but often suffer from slow convergence, while second-order methods achieve fast local convergence at the expense of costly Hessian inversions. In this paper, we highlight a middle ground: minimizing a quadratic majorant with fixed curvature at each iteration. This strategy strikes a balance between per-iteration cost and convergence speed, and crucially allows the reuse of matrix decompositions, such as Cholesky or spectral decompositions, across iterations and varying regularization parameters. We introduce the Quadratic Majorization Minimization with Extrapolation (QMME) framework and establish its sequential convergence properties under standard assumptions. The new perspective of our analysis is to center the arguments around the induced norm of the curvature matrix $H$. To demonstrate practical advantages, we apply QMME to large-scale kernel regularized learning problems. In particular, we propose a novel Sylvester equation modelling technique for kernel multinomial regression. In Julia-based experiments, QMME compares favorably against various established first- and second-order methods. Furthermore, we demonstrate that our algorithms complement existing kernel approximation techniques through more efficiently handling sketching matrices with large projection dimensions. Our numerical experiments and real data analysis are available and fully reproducible at https://github.com/qhengncsu/QMME.jl.
Related papers
- The Stochastic Conjugate Subgradient Algorithm For Kernel Support Vector Machines [1.738375118265695]
This paper proposes an innovative method specifically designed for kernel support vector machines (SVMs)
It not only achieves faster iteration per iteration but also exhibits enhanced convergence when compared to conventional SFO techniques.
Our experimental results demonstrate that the proposed algorithm not only maintains but potentially exceeds the scalability of SFO methods.
arXiv Detail & Related papers (2024-07-30T17:03:19Z) - Stochastic Optimization for Non-convex Problem with Inexact Hessian
Matrix, Gradient, and Function [99.31457740916815]
Trust-region (TR) and adaptive regularization using cubics have proven to have some very appealing theoretical properties.
We show that TR and ARC methods can simultaneously provide inexact computations of the Hessian, gradient, and function values.
arXiv Detail & Related papers (2023-10-18T10:29:58Z) - PROMISE: Preconditioned Stochastic Optimization Methods by Incorporating Scalable Curvature Estimates [17.777466668123886]
We introduce PROMISE ($textbfPr$econditioned $textbfO$ptimization $textbfM$ethods by $textbfI$ncorporating $textbfS$calable Curvature $textbfE$stimates), a suite of sketching-based preconditioned gradient algorithms.
PROMISE includes preconditioned versions of SVRG, SAGA, and Katyusha.
arXiv Detail & Related papers (2023-09-05T07:49:10Z) - Simplifying Momentum-based Positive-definite Submanifold Optimization
with Applications to Deep Learning [24.97120654216651]
We show how to solve difficult differential equations with momentum on a submanifold.
We do so by proposing a generalized version of the Riemannian normal coordinates.
We use our approach to simplify existing approaches for structured covariances and develop matrix-inverse-free $2textnd$orders for deep learning with low precision by using only matrix multiplications.
arXiv Detail & Related papers (2023-02-20T03:31:11Z) - Faster One-Sample Stochastic Conditional Gradient Method for Composite
Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms.
The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z) - Reducing the Variance of Gaussian Process Hyperparameter Optimization
with Preconditioning [54.01682318834995]
Preconditioning is a highly effective step for any iterative method involving matrix-vector multiplication.
We prove that preconditioning has an additional benefit that has been previously unexplored.
It simultaneously can reduce variance at essentially negligible cost.
arXiv Detail & Related papers (2021-07-01T06:43:11Z) - Unified Convergence Analysis for Adaptive Optimization with Moving Average Estimator [75.05106948314956]
We show that an increasing large momentum parameter for the first-order moment is sufficient for adaptive scaling.<n>We also give insights for increasing the momentum in a stagewise manner in accordance with stagewise decreasing step size.
arXiv Detail & Related papers (2021-04-30T08:50:24Z) - Understanding Implicit Regularization in Over-Parameterized Single Index
Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model.
We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z) - Effective Dimension Adaptive Sketching Methods for Faster Regularized
Least-Squares Optimization [56.05635751529922]
We propose a new randomized algorithm for solving L2-regularized least-squares problems based on sketching.
We consider two of the most popular random embeddings, namely, Gaussian embeddings and the Subsampled Randomized Hadamard Transform (SRHT)
arXiv Detail & Related papers (2020-06-10T15:00:09Z) - On Distributed Non-convex Optimization: Projected Subgradient Method For
Weakly Convex Problems in Networks [13.385373310554327]
The Moreau subgradient method converges linear sharpness problems in machine learning.
A distributed implementation of the subgradient method with a theoretical guarantee is proposed.
arXiv Detail & Related papers (2020-04-28T01:01:49Z) - Optimal Randomized First-Order Methods for Least-Squares Problems [56.05635751529922]
This class of algorithms encompasses several randomized methods among the fastest solvers for least-squares problems.
We focus on two classical embeddings, namely, Gaussian projections and subsampled Hadamard transforms.
Our resulting algorithm yields the best complexity known for solving least-squares problems with no condition number dependence.
arXiv Detail & Related papers (2020-02-21T17:45:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.