Related papers: On Finite-Step Convergence of the Non-Greedy Algorithm and Proximal Alternating Minimization Method with Extrapolation for $L

On Finite-Step Convergence of the Non-Greedy Algorithm and Proximal Alternating Minimization Method with Extrapolation for $L_1$-Norm PCA

URL: http://arxiv.org/abs/2302.07712v3
Date: Wed, 22 Mar 2023 09:57:57 GMT
Title: On Finite-Step Convergence of the Non-Greedy Algorithm and Proximal Alternating Minimization Method with Extrapolation for $L_1$-Norm PCA
Authors: Yuning Yang
Abstract summary: The non-dygree (NGA) and the recently proposed proximal alternating method with PCA (PAMe) for $L_1-norm are studied. It is shown that the iterative points generated by the modified algorithm will not change after at most $leftlceilfracFmaxtau rightrceil$ steps. For PAMe, it is proved that the sign variable will remain constant after finitely many steps and the algorithm can a point satisfying certain optimality condition.
Score: 0.685316573653194
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The classical non-greedy algorithm (NGA) and the recently proposed proximal alternating minimization method with extrapolation (PAMe) for $L_1$-norm PCA are revisited and their finite-step convergence are studied. It is first shown that NGA can be interpreted as a conditional subgradient or an alternating maximization method. By recognizing it as a conditional subgradient, we prove that the iterative points generated by the algorithm will be constant in finitely many steps under a certain full-rank assumption; such an assumption can be removed when the projection dimension is one. By treating the algorithm as an alternating maximization, we then prove that the objective value will be fixed after at most $\left\lceil\frac{F^{\max}}{\tau_0} \right\rceil$ steps, where the stopping point satisfies certain optimality conditions. Then, a slight modification of NGA with improved convergence properties is analyzed. It is shown that the iterative points generated by the modified algorithm will not change after at most $\left\lceil\frac{2F^{\max}}{\tau} \right\rceil$ steps; furthermore, the stopping point satisfies certain optimality conditions if the proximal parameter $\tau$ is small enough. For PAMe, it is proved that the sign variable will remain constant after finitely many steps and the algorithm can output a point satisfying certain optimality condition, if the parameters are small enough and a full rank assumption is satisfied. Moreover, if there is no proximal term on the projection matrix related subproblem, then the iterative points generated by this modified algorithm will not change after at most $\left\lceil \frac{4F^{\max}}{\tau(1-\gamma)} \right\rceil$ steps and the stopping point also satisfies certain optimality conditions, provided similar assumptions as those for PAMe. The full rank assumption can be removed when the projection dimension is one.

Related papers

On Penalty Methods for Nonconvex Bilevel Optimization and First-Order Stochastic Approximation [13.813242559935732]
We show first-order algorithms for solving Bilevel Optimization problems. In particular, we show a strong connection between the penalty function and the hyper-objective. We show an improved oracle-complexity of $O(epsilon-3)$ and $O(epsilon-5)$, respectively.
arXiv Detail & Related papers (2023-09-04T18:25:43Z)
Variance reduction techniques for stochastic proximal point algorithms [5.374800961359305]
In this work, we propose the first unified study of variance reduction techniques for proximal point algorithms. We introduce a generic proximal-based algorithm that can be specified to give the proximal version of SVRG, SAGA, and some of their variants. Our experiments demonstrate the advantages of the proximal variance reduction methods over their gradient counterparts.
arXiv Detail & Related papers (2023-08-18T05:11:50Z)
On the Linear Convergence of Policy Gradient under Hadamard Parameterization [4.182089296199263]
We study the convergence of deterministic policy gradient under the Hadamard parameterization. We show that the error decreases at an $O(frac1k)$ rate for all the iterations.
arXiv Detail & Related papers (2023-05-31T05:51:15Z)
Min-Max Optimization Made Simple: Approximating the Proximal Point Method via Contraction Maps [77.8999425439444]
We present a first-order method that admits near-optimal convergence rates for convex/concave min-max problems. Our work is based on the fact that the update rule of the Proximal Point method can be approximated up to accuracy.
arXiv Detail & Related papers (2023-01-10T12:18:47Z)
Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model. The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z)
A first-order primal-dual method with adaptivity to local smoothness [64.62056765216386]
We consider the problem of finding a saddle point for the convex-concave objective $min_x max_y f(x) + langle Ax, yrangle - g*(y)$, where $f$ is a convex function with locally Lipschitz gradient and $g$ is convex and possibly non-smooth. We propose an adaptive version of the Condat-Vu algorithm, which alternates between primal gradient steps and dual steps.
arXiv Detail & Related papers (2021-10-28T14:19:30Z)
A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm for Bilevel Optimization [112.59170319105971]
We propose a new algorithm -- the Momentum- Single-timescale Approximation (MSTSA) -- for tackling problems. MSTSA allows us to control the error in iterations due to inaccurate solution to the lower level subproblem.
arXiv Detail & Related papers (2021-02-15T07:10:33Z)
Adaptive extra-gradient methods for min-max optimization and games [35.02879452114223]
We present a new family of minmax optimization algorithms that automatically exploit the geometry of the gradient data observed at earlier iterations. Thanks to this adaptation mechanism, the proposed method automatically detects whether the problem is smooth or not. It converges to an $varepsilon$-optimal solution within $mathcalO (1/varepsilon)$ iterations in smooth problems, and within $mathcalO (1/varepsilon)$ iterations in non-smooth ones.
arXiv Detail & Related papers (2020-10-22T22:54:54Z)
ROOT-SGD: Sharp Nonasymptotics and Near-Optimal Asymptotics in a Single Algorithm [71.13558000599839]
We study the problem of solving strongly convex and smooth unconstrained optimization problems using first-order algorithms. We devise a novel, referred to as Recursive One-Over-T SGD, based on an easily implementable, averaging of past gradients. We prove that it simultaneously achieves state-of-the-art performance in both a finite-sample, nonasymptotic sense and an sense.
arXiv Detail & Related papers (2020-08-28T14:46:56Z)
Conservative Stochastic Optimization with Expectation Constraints [11.393603788068777]
This paper considers convex optimization problems where the objective and constraint functions involve expectations with respect to the data indices or environmental variables. Online and efficient approaches for solving such problems have not been widely studied. We propose a novel conservative optimization algorithm (CSOA) that achieves zero constraint violation and $Oleft(T-frac12right)$ optimality gap.
arXiv Detail & Related papers (2020-08-13T08:56:24Z)
A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic [142.1492359556374]
Bilevel optimization is a class of problems which exhibit a two-level structure. We propose a two-timescale approximation (TTSA) algorithm for tackling such a bilevel problem. We show that a two-timescale natural actor-critic policy optimization algorithm can be viewed as a special case of our TTSA framework.
arXiv Detail & Related papers (2020-07-10T05:20:02Z)
Exploiting Higher Order Smoothness in Derivative-free Optimization and Continuous Bandits [99.70167985955352]
We study the problem of zero-order optimization of a strongly convex function. We consider a randomized approximation of the projected gradient descent algorithm. Our results imply that the zero-order algorithm is nearly optimal in terms of sample complexity and the problem parameters.
arXiv Detail & Related papers (2020-06-14T10:42:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.