Avoiding strict saddle points of nonconvex regularized problems
- URL: http://arxiv.org/abs/2401.09274v5
- Date: Thu, 12 Dec 2024 12:53:10 GMT
- Title: Avoiding strict saddle points of nonconvex regularized problems
- Authors: Luwei Bai, Yaohua Hu, Hao Wang, Xiaoqi Yang,
- Abstract summary: We show the second terms optimality conditions only depend on the nonzeros of the stationary points.<n>We propose two re-weighted iterativeweighted algorithms including the iteratively reweighted $ell_1$.<n>We show these algorithms converge only to the local saddlers with randomly when the saddle point property is assumed.
- Score: 3.92625489118339
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we consider a class of non-convex and non-smooth sparse optimization problems, which encompass most existing nonconvex sparsity-inducing terms. We show the second-order optimality conditions only depend on the nonzeros of the stationary points. We propose two damped iterative reweighted algorithms including the iteratively reweighted $\ell_1$ algorithm (DIRL$_1$) and the iteratively reweighted $\ell_2$ (DIRL$_2$) algorithm, to solve these problems. For DIRL$_1$, we show the reweighted $\ell_1$ subproblem has support identification property so that DIRL$_1$ locally reverts to a gradient descent algorithm around a stationary point. For DIRL$_2$, we show the solution map of the reweighted $\ell_2$ subproblem is differentiable and Lipschitz continuous everywhere. Therefore, the map of DIRL$_1$ and DIRL$_2$ and their inverse are Lipschitz continuous, and the strict saddle points are their unstable fixed points. By applying the stable manifold theorem, these algorithms are shown to converge only to local minimizers with randomly initialization when the strictly saddle point property is assumed.
Related papers
- Gradient Norm Regularization Second-Order Algorithms for Solving Nonconvex-Strongly Concave Minimax Problems [2.3721580767877257]
We propose an algorithm for solving non-strongly concave minimax problems.
We show that the proposed algorithm achieves an $mathcal.
stilon.
$second-order norm is proved to be upper by.
$tk.
$k.
$k.
$k.
$k.
$k.
$k.
$k.
$k.
$k.
$k.
$k
arXiv Detail & Related papers (2024-11-24T09:46:36Z) - Alternating Iteratively Reweighted $\ell_1$ and Subspace Newton Algorithms for Nonconvex Sparse Optimization [11.56128809794923]
We present a novel hybrid algorithm for minimizing the sum of a differentiable loss function and a nonsmooth regularization function.
We prove global convergence to a critical point and, under suitable conditions, demonstrate that the algorithm outperforms existing methods.
arXiv Detail & Related papers (2024-07-24T12:15:59Z) - Gradient-free optimization of highly smooth functions: improved analysis
and a new algorithm [87.22224691317766]
This work studies problems with zero-order noisy oracle information under the assumption that the objective function is highly smooth.
We consider two kinds of zero-order projected gradient descent algorithms.
arXiv Detail & Related papers (2023-06-03T17:05:13Z) - A Cubic-regularized Policy Newton Algorithm for Reinforcement Learning [9.628032156001073]
We propose two policy Newton algorithms that incorporate cubic regularization.
Both algorithms employ the likelihood ratio method to form estimates of the gradient and Hessian of the value function.
In particular, the sample complexity of our algorithms to find an $epsilon$-SOSP is $O(epsilon-3.5)$, which is an improvement over the state-of-the-art sample complexity of $O(epsilon-4.5)$.
arXiv Detail & Related papers (2023-04-21T13:43:06Z) - Differentially Private Algorithms for the Stochastic Saddle Point
Problem with Optimal Rates for the Strong Gap [12.446156563700482]
We show that convex-concave Lipschitz saddle point problems can be solved under the constraint of $(epsilon,delta)$differential privacy.
We also show that there exists a fundamental tradeoff between stability and accuracy.
arXiv Detail & Related papers (2023-02-24T21:50:02Z) - Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model.
The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z) - A gradient estimator via L1-randomization for online zero-order
optimization with two point feedback [93.57603470949266]
We present a novel gradient estimator based on two function evaluation and randomization.
We consider two types of assumptions on the noise of the zero-order oracle: canceling noise and adversarial noise.
We provide an anytime and completely data-driven algorithm, which is adaptive to all parameters of the problem.
arXiv Detail & Related papers (2022-05-27T11:23:57Z) - A Projection-free Algorithm for Constrained Stochastic Multi-level
Composition Optimization [12.096252285460814]
We propose a projection-free conditional gradient-type algorithm for composition optimization.
We show that the number of oracles and the linear-minimization oracle required by the proposed algorithm, are of order $mathcalO_T(epsilon-2)$ and $mathcalO_T(epsilon-3)$ respectively.
arXiv Detail & Related papers (2022-02-09T06:05:38Z) - A first-order primal-dual method with adaptivity to local smoothness [64.62056765216386]
We consider the problem of finding a saddle point for the convex-concave objective $min_x max_y f(x) + langle Ax, yrangle - g*(y)$, where $f$ is a convex function with locally Lipschitz gradient and $g$ is convex and possibly non-smooth.
We propose an adaptive version of the Condat-Vu algorithm, which alternates between primal gradient steps and dual steps.
arXiv Detail & Related papers (2021-10-28T14:19:30Z) - The Complexity of Nonconvex-Strongly-Concave Minimax Optimization [43.07732143522183]
This paper establishes the complexity for finding approximate stationary points of non-strongly-concave (NC-SC) smooth minimax problems.
We deploy a proposed sequence of $Omega-strong$lyconcave sub-2 problems in both general complexity and averaged complexity.
In our proposed finite-sum setting, our proposed algorithm provides a nearly-tight dependence on the condition number.
arXiv Detail & Related papers (2021-03-29T18:53:57Z) - Private Stochastic Convex Optimization: Optimal Rates in $\ell_1$
Geometry [69.24618367447101]
Up to logarithmic factors the optimal excess population loss of any $(varepsilon,delta)$-differently private is $sqrtlog(d)/n + sqrtd/varepsilon n.$
We show that when the loss functions satisfy additional smoothness assumptions, the excess loss is upper bounded (up to logarithmic factors) by $sqrtlog(d)/n + (log(d)/varepsilon n)2/3.
arXiv Detail & Related papers (2021-03-02T06:53:44Z) - Escaping Saddle Points for Nonsmooth Weakly Convex Functions via
Perturbed Proximal Algorithms [0.0]
Main results are based on a novel characterization of $epsilon$-approximate local minimum for nonsmooth functions.
We show that under standard assumptions, the perturbed proximal point, perturbed proximal gradient and perturbed proximal linear algorithms find $epsilon$-approximate local minimum for nonsmooth weakly convex functions.
arXiv Detail & Related papers (2021-02-04T19:17:13Z) - Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions [84.49087114959872]
We provide the first non-asymptotic analysis for finding stationary points of nonsmooth, nonsmooth functions.
In particular, we study Hadamard semi-differentiable functions, perhaps the largest class of nonsmooth functions.
arXiv Detail & Related papers (2020-02-10T23:23:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.