Stochastic regularized majorization-minimization with weakly convex and
multi-convex surrogates
- URL: http://arxiv.org/abs/2201.01652v3
- Date: Tue, 21 Mar 2023 07:35:09 GMT
- Title: Stochastic regularized majorization-minimization with weakly convex and
multi-convex surrogates
- Authors: Hanbaek Lyu
- Abstract summary: We show that the first optimality gap of the proposed algorithm decays at the rate of the expected loss for various methods under nontens dependent data setting.
We obtain first convergence point for various methods under nontens dependent data setting.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Stochastic majorization-minimization (SMM) is a class of stochastic
optimization algorithms that proceed by sampling new data points and minimizing
a recursive average of surrogate functions of an objective function. The
surrogates are required to be strongly convex and convergence rate analysis for
the general non-convex setting was not available. In this paper, we propose an
extension of SMM where surrogates are allowed to be only weakly convex or block
multi-convex, and the averaged surrogates are approximately minimized with
proximal regularization or block-minimized within diminishing radii,
respectively. For the general nonconvex constrained setting with non-i.i.d.
data samples, we show that the first-order optimality gap of the proposed
algorithm decays at the rate $O((\log n)^{1+\epsilon}/n^{1/2})$ for the
empirical loss and $O((\log n)^{1+\epsilon}/n^{1/4})$ for the expected loss,
where $n$ denotes the number of data samples processed. Under some additional
assumption, the latter convergence rate can be improved to $O((\log
n)^{1+\epsilon}/n^{1/2})$. As a corollary, we obtain the first convergence rate
bounds for various optimization methods under general nonconvex dependent data
setting: Double-averaging projected gradient descent and its generalizations,
proximal point empirical risk minimization, and online matrix/tensor
decomposition algorithms. We also provide experimental validation of our
results.
Related papers
- Obtaining Lower Query Complexities through Lightweight Zeroth-Order Proximal Gradient Algorithms [65.42376001308064]
We propose two variance reduced ZO estimators for complex gradient problems.
We improve the state-of-the-art function complexities from $mathcalOleft(minfracdn1/2epsilon2, fracdepsilon3right)$ to $tildecalOleft(fracdepsilon2right)$.
arXiv Detail & Related papers (2024-10-03T15:04:01Z) - A Sample Efficient Alternating Minimization-based Algorithm For Robust Phase Retrieval [56.67706781191521]
In this work, we present a robust phase retrieval problem where the task is to recover an unknown signal.
Our proposed oracle avoids the need for computationally spectral descent, using a simple gradient step and outliers.
arXiv Detail & Related papers (2024-09-07T06:37:23Z) - Accelerated Stochastic Min-Max Optimization Based on Bias-corrected Momentum [30.01198677588252]
First-order algorithms require at least $mathcalO(varepsilonepsilon-4)$ complexity to find an $varepsilon-stationary point.
We introduce novel momentum algorithms utilizing efficient variable complexity.
The effectiveness of the method is validated through robust logistic regression using real-world datasets.
arXiv Detail & Related papers (2024-06-18T20:14:52Z) - Double Variance Reduction: A Smoothing Trick for Composite Optimization Problems without First-Order Gradient [40.22217106270146]
Variance reduction techniques are designed to decrease the sampling variance, thereby accelerating convergence rates of first-order (FO) and zeroth-order (ZO) optimization methods.
In composite optimization problems, ZO methods encounter an additional variance called the coordinate-wise variance, which stems from the random estimation.
This paper proposes the Zeroth-order Proximal Double Variance Reduction (ZPDVR) method, which utilizes the averaging trick to reduce both sampling and coordinate-wise variances.
arXiv Detail & Related papers (2024-05-28T02:27:53Z) - Stochastic Optimization for Non-convex Problem with Inexact Hessian
Matrix, Gradient, and Function [99.31457740916815]
Trust-region (TR) and adaptive regularization using cubics have proven to have some very appealing theoretical properties.
We show that TR and ARC methods can simultaneously provide inexact computations of the Hessian, gradient, and function values.
arXiv Detail & Related papers (2023-10-18T10:29:58Z) - Empirical Risk Minimization with Shuffled SGD: A Primal-Dual Perspective
and Improved Bounds [12.699376765058137]
gradient descent (SGD) is perhaps the most prevalent optimization method in modern machine learning.
It is only very recently that SGD with sampling without replacement -- shuffled SGD -- has been analyzed.
We prove fine-grained complexity bounds that depend on the data matrix and are never worse than what is predicted by the existing bounds.
arXiv Detail & Related papers (2023-06-21T18:14:44Z) - Explicit Second-Order Min-Max Optimization Methods with Optimal Convergence Guarantee [86.05440220344755]
We propose and analyze inexact regularized Newton-type methods for finding a global saddle point of emphcon unconstrained min-max optimization problems.
We show that the proposed methods generate iterates that remain within a bounded set and that the iterations converge to an $epsilon$-saddle point within $O(epsilon-2/3)$ in terms of a restricted function.
arXiv Detail & Related papers (2022-10-23T21:24:37Z) - Accelerated Single-Call Methods for Constrained Min-Max Optimization [5.266784779001398]
Existing methods either require two gradient calls or two projections in each iteration, which may costly in some applications.
In this paper, we first show that a variant of the Optimistic Gradient (RGOG) has a rich set of non-comonrate min-max convergence rate problems.
Our convergence rates hold for standard measures such as and the natural and the natural.
arXiv Detail & Related papers (2022-10-06T17:50:42Z) - Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model.
The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z) - Block majorization-minimization with diminishing radius for constrained
nonconvex optimization [9.907540661545328]
Block tensor regularization-minimization (BMM) is a simple iterative algorithm for non constrained optimization that minimizes major surrogates in each block.
We show that BMM can produce a gradient $O(epsilon-2(logepsilon-1)2)$ when convex surrogates are used.
arXiv Detail & Related papers (2020-12-07T07:53:09Z) - Gradient Free Minimax Optimization: Variance Reduction and Faster
Convergence [120.9336529957224]
In this paper, we denote the non-strongly setting on the magnitude of a gradient-free minimax optimization problem.
We show that a novel zeroth-order variance reduced descent algorithm achieves the best known query complexity.
arXiv Detail & Related papers (2020-06-16T17:55:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.