An algorithmic view of $\ell_2$ regularization and some path-following
algorithms
- URL: http://arxiv.org/abs/2107.03322v1
- Date: Wed, 7 Jul 2021 16:00:13 GMT
- Title: An algorithmic view of $\ell_2$ regularization and some path-following
algorithms
- Authors: Yunzhang Zhu and Renxiong Liu
- Abstract summary: We establish an equivalence between the $ell$-regularized solution path for a convex loss function and the solution of an ordinary differentiable equation (ODE)
This equivalence reveals that the solution path can be viewed as the flow of a hybrid of gradient descent and Newton method applying to the empirical loss.
New path-following algorithms based on homotopy methods and numerical ODE solvers are proposed to numerically approximate the solution path.
- Score: 7.6146285961466
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We establish an equivalence between the $\ell_2$-regularized solution path
for a convex loss function, and the solution of an ordinary differentiable
equation (ODE). Importantly, this equivalence reveals that the solution path
can be viewed as the flow of a hybrid of gradient descent and Newton method
applying to the empirical loss, which is similar to a widely used optimization
technique called trust region method. This provides an interesting algorithmic
view of $\ell_2$ regularization, and is in contrast to the conventional view
that the $\ell_2$ regularization solution path is similar to the gradient flow
of the empirical loss.New path-following algorithms based on homotopy methods
and numerical ODE solvers are proposed to numerically approximate the solution
path. In particular, we consider respectively Newton method and gradient
descent method as the basis algorithm for the homotopy method, and establish
their approximation error rates over the solution path. Importantly, our theory
suggests novel schemes to choose grid points that guarantee an arbitrarily
small suboptimality for the solution path. In terms of computational cost, we
prove that in order to achieve an $\epsilon$-suboptimality for the entire
solution path, the number of Newton steps required for the Newton method is
$\mathcal O(\epsilon^{-1/2})$, while the number of gradient steps required for
the gradient descent method is $\mathcal O\left(\epsilon^{-1}
\ln(\epsilon^{-1})\right)$. Finally, we use $\ell_2$-regularized logistic
regression as an illustrating example to demonstrate the effectiveness of the
proposed path-following algorithms.
Related papers
- Beyond Discretization: Learning the Optimal Solution Path [3.9675360887122646]
We propose an approach that parameterizes the solution path with a set of basis functions and solves a emphsingle optimization problem.
Our method offers substantial complexity improvements over discretization.
We also prove stronger results for special cases common in machine learning.
arXiv Detail & Related papers (2024-10-18T22:23:42Z) - A quantum central path algorithm for linear optimization [5.450016817940232]
We propose a novel quantum algorithm for solving linear optimization problems by quantum-mechanical simulation of the central path.
This approach yields an algorithm for solving linear optimization problems involving $m$ constraints and $n$ variables to $varepsilon$-optimality.
In the standard gate model (i.e., without access to quantum RAM), our algorithm can obtain highly-precise solutions to LO problems using at most $$mathcalO left( sqrtm + n textsfnnz (A) fracR_1
arXiv Detail & Related papers (2023-11-07T13:26:20Z) - A Cubic-regularized Policy Newton Algorithm for Reinforcement Learning [9.628032156001073]
We propose two policy Newton algorithms that incorporate cubic regularization.
Both algorithms employ the likelihood ratio method to form estimates of the gradient and Hessian of the value function.
In particular, the sample complexity of our algorithms to find an $epsilon$-SOSP is $O(epsilon-3.5)$, which is an improvement over the state-of-the-art sample complexity of $O(epsilon-4.5)$.
arXiv Detail & Related papers (2023-04-21T13:43:06Z) - A Newton-CG based barrier-augmented Lagrangian method for general nonconvex conic optimization [53.044526424637866]
In this paper we consider finding an approximate second-order stationary point (SOSP) that minimizes a twice different subject general non conic optimization.
In particular, we propose a Newton-CG based-augmentedconjugate method for finding an approximate SOSP.
arXiv Detail & Related papers (2023-01-10T20:43:29Z) - Explicit Second-Order Min-Max Optimization Methods with Optimal Convergence Guarantee [86.05440220344755]
We propose and analyze inexact regularized Newton-type methods for finding a global saddle point of emphcon unconstrained min-max optimization problems.
We show that the proposed methods generate iterates that remain within a bounded set and that the iterations converge to an $epsilon$-saddle point within $O(epsilon-2/3)$ in terms of a restricted function.
arXiv Detail & Related papers (2022-10-23T21:24:37Z) - DRSOM: A Dimension Reduced Second-Order Method [13.778619250890406]
Under a trust-like framework, our method preserves the convergence of the second-order method while using only information in a few directions.
Theoretically, we show that the method has a local convergence and a global convergence rate of $O(epsilon-3/2)$ to satisfy the first-order and second-order conditions.
arXiv Detail & Related papers (2022-07-30T13:05:01Z) - A first-order primal-dual method with adaptivity to local smoothness [64.62056765216386]
We consider the problem of finding a saddle point for the convex-concave objective $min_x max_y f(x) + langle Ax, yrangle - g*(y)$, where $f$ is a convex function with locally Lipschitz gradient and $g$ is convex and possibly non-smooth.
We propose an adaptive version of the Condat-Vu algorithm, which alternates between primal gradient steps and dual steps.
arXiv Detail & Related papers (2021-10-28T14:19:30Z) - A Unified Single-loop Alternating Gradient Projection Algorithm for
Nonconvex-Concave and Convex-Nonconcave Minimax Problems [8.797831153231664]
We develop an efficient algorithm for solving minimax problems with theoretical general convexnon objective guarantees.
We show that the proposed algorithm can be used to solve both noncaveepsilon concave (strongly) and (strongly) convexnonconcave minimax problems.
arXiv Detail & Related papers (2020-06-03T04:00:52Z) - Second-order Conditional Gradient Sliding [79.66739383117232]
We present the emphSecond-Order Conditional Gradient Sliding (SOCGS) algorithm.
The SOCGS algorithm converges quadratically in primal gap after a finite number of linearly convergent iterations.
It is useful when the feasible region can only be accessed efficiently through a linear optimization oracle.
arXiv Detail & Related papers (2020-02-20T17:52:18Z) - Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions [84.49087114959872]
We provide the first non-asymptotic analysis for finding stationary points of nonsmooth, nonsmooth functions.
In particular, we study Hadamard semi-differentiable functions, perhaps the largest class of nonsmooth functions.
arXiv Detail & Related papers (2020-02-10T23:23:04Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.