Randomized Bregman Coordinate Descent Methods for Non-Lipschitz
Optimization
- URL: http://arxiv.org/abs/2001.05202v1
- Date: Wed, 15 Jan 2020 09:57:38 GMT
- Title: Randomized Bregman Coordinate Descent Methods for Non-Lipschitz
Optimization
- Authors: Tianxiang Gao, Songtao Lu, Jia Liu, Chris Chu
- Abstract summary: A new textitrandomized Bregman (block) coordinate descent (CD) method is proposed.
We show that the proposed method is $O(epsilon-2n)$ to achieve $epsilon-stationary point, where $n$ is the number of blocks of coordinates.
- Score: 31.474280642125734
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a new \textit{randomized Bregman (block) coordinate descent}
(RBCD) method for minimizing a composite problem, where the objective function
could be either convex or nonconvex, and the smooth part are freed from the
global Lipschitz-continuous (partial) gradient assumption. Under the notion of
relative smoothness based on the Bregman distance, we prove that every limit
point of the generated sequence is a stationary point. Further, we show that
the iteration complexity of the proposed method is $O(n\varepsilon^{-2})$ to
achieve $\epsilon$-stationary point, where $n$ is the number of blocks of
coordinates. If the objective is assumed to be convex, the iteration complexity
is improved to $O(n\epsilon^{-1} )$. If, in addition, the objective is strongly
convex (relative to the reference function), the global linear convergence rate
is recovered. We also present the accelerated version of the RBCD method, which
attains an $O(n\varepsilon^{-1/\gamma} )$ iteration complexity for the convex
case, where the scalar $\gamma\in [1,2]$ is determined by the
\textit{generalized translation variant} of the Bregman distance. Convergence
analysis without assuming the global Lipschitz-continuous (partial) gradient
sets our results apart from the existing works in the composite problems.
Related papers
- Obtaining Lower Query Complexities through Lightweight Zeroth-Order Proximal Gradient Algorithms [65.42376001308064]
We propose two variance reduced ZO estimators for complex gradient problems.
We improve the state-of-the-art function complexities from $mathcalOleft(minfracdn1/2epsilon2, fracdepsilon3right)$ to $tildecalOleft(fracdepsilon2right)$.
arXiv Detail & Related papers (2024-10-03T15:04:01Z) - Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs [56.237917407785545]
We consider the problem of learning an $varepsilon$-optimal policy in a general class of continuous-space Markov decision processes (MDPs) having smooth Bellman operators.
Key to our solution is a novel projection technique based on ideas from harmonic analysis.
Our result bridges the gap between two popular but conflicting perspectives on continuous-space MDPs.
arXiv Detail & Related papers (2024-05-10T09:58:47Z) - Revisiting Subgradient Method: Complexity and Convergence Beyond Lipschitz Continuity [24.45688490844496]
Subgradient method is one of the most fundamental algorithmic schemes for nonsmooth optimization.
In this work, we first extend the typical iteration complexity results for the subgradient method to cover non-Lipschitz convex and weakly convex minimization.
arXiv Detail & Related papers (2023-05-23T15:26:36Z) - A Newton-CG based barrier-augmented Lagrangian method for general nonconvex conic optimization [53.044526424637866]
In this paper we consider finding an approximate second-order stationary point (SOSP) that minimizes a twice different subject general non conic optimization.
In particular, we propose a Newton-CG based-augmentedconjugate method for finding an approximate SOSP.
arXiv Detail & Related papers (2023-01-10T20:43:29Z) - ReSQueing Parallel and Private Stochastic Convex Optimization [59.53297063174519]
We introduce a new tool for BFG convex optimization (SCO): a Reweighted Query (ReSQue) estimator for the gradient of a function convolved with a (Gaussian) probability density.
We develop algorithms achieving state-of-the-art complexities for SCO in parallel and private settings.
arXiv Detail & Related papers (2023-01-01T18:51:29Z) - On the Complexity of Finding Small Subgradients in Nonsmooth
Optimization [31.714928102950584]
We show that no dimension-free rate can be achieved by a deterministic algorithm.
We show how the convergence rate of finding $(delta,epsilon)$-stationary points can be improved in case the function is convex.
arXiv Detail & Related papers (2022-09-21T13:30:00Z) - Thinking Outside the Ball: Optimal Learning with Gradient Descent for
Generalized Linear Stochastic Convex Optimization [37.177329562964765]
We consider linear prediction with a convex Lipschitz loss, or more generally, convex optimization problems of generalized linear form.
We show that in this setting, early iteration stopped Gradient Descent (GD), without any explicit regularization or projection, ensures excess error at most $epsilon$.
But instead of uniform convergence in a norm ball, which we show can guarantee suboptimal learning using $Theta (1/epsilon4)$ samples, we rely on uniform convergence in a distribution-dependent ball.
arXiv Detail & Related papers (2022-02-27T09:41:43Z) - Generalized Optimistic Methods for Convex-Concave Saddle Point Problems [24.5327016306566]
The optimistic method has seen increasing popularity for solving convex-concave saddle point problems.
We develop a backtracking line search scheme to select the step sizes without knowledge of coefficients.
arXiv Detail & Related papers (2022-02-19T20:31:05Z) - A first-order primal-dual method with adaptivity to local smoothness [64.62056765216386]
We consider the problem of finding a saddle point for the convex-concave objective $min_x max_y f(x) + langle Ax, yrangle - g*(y)$, where $f$ is a convex function with locally Lipschitz gradient and $g$ is convex and possibly non-smooth.
We propose an adaptive version of the Condat-Vu algorithm, which alternates between primal gradient steps and dual steps.
arXiv Detail & Related papers (2021-10-28T14:19:30Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.