Super-Universal Regularized Newton Method
- URL: http://arxiv.org/abs/2208.05888v1
- Date: Thu, 11 Aug 2022 15:44:56 GMT
- Title: Super-Universal Regularized Newton Method
- Authors: Nikita Doikov, Konstantin Mishchenko, Yurii Nesterov
- Abstract summary: We introduce a family of problem classes characterized by H"older continuity of either the second or third derivative.
We present the method with a simple adaptive search procedure allowing an automatic adjustment to the problem class with the best global complexity bounds.
- Score: 14.670197265564196
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We analyze the performance of a variant of Newton method with quadratic
regularization for solving composite convex minimization problems. At each step
of our method, we choose regularization parameter proportional to a certain
power of the gradient norm at the current point. We introduce a family of
problem classes characterized by H\"older continuity of either the second or
third derivative. Then we present the method with a simple adaptive search
procedure allowing an automatic adjustment to the problem class with the best
global complexity bounds, without knowing specific parameters of the problem.
In particular, for the class of functions with Lipschitz continuous third
derivative, we get the global $O(1/k^3)$ rate, which was previously attributed
to third-order tensor methods. When the objective function is uniformly convex,
we justify an automatic acceleration of our scheme, resulting in a faster
global rate and local superlinear convergence. The switching between the
different rates (sublinear, linear, and superlinear) is automatic. Again, for
that, no a priori knowledge of parameters is needed.
Related papers
- Efficient Low-rank Identification via Accelerated Iteratively Reweighted Nuclear Norm Minimization [8.879403568685499]
We introduce an adaptive updating strategy for smoothing parameters.
This behavior transforms the algorithm into one that effectively solves problems after a few iterations.
We prove the global proposed experiment, guaranteeing that every iteration is a critical one.
arXiv Detail & Related papers (2024-06-22T02:37:13Z) - First and zeroth-order implementations of the regularized Newton method
with lazy approximated Hessians [4.62316736194615]
We develop Lip-order (Hessian-O) and zero-order (derivative-free) implementations of general non-free$ normfree problems.
We also equip our algorithms with the lazy bound update that reuses a previously computed Hessian approximation matrix for several iterations.
arXiv Detail & Related papers (2023-09-05T17:40:54Z) - Minimizing Quasi-Self-Concordant Functions by Gradient Regularization of
Newton Method [4.62316736194615]
We study the composite convex optimization problems with a Quasi-Self-Concordant smooth component.
This problem class naturally interpolates between classic Self-Concordant functions and functions with Lipschitz continuous Hessian.
We show that for minimizing Quasi-Self-Concordant functions we can use instead the basic Newton Method with Gradient Regularization.
arXiv Detail & Related papers (2023-08-28T17:43:04Z) - Constrained Optimization via Exact Augmented Lagrangian and Randomized
Iterative Sketching [55.28394191394675]
We develop an adaptive inexact Newton method for equality-constrained nonlinear, nonIBS optimization problems.
We demonstrate the superior performance of our method on benchmark nonlinear problems, constrained logistic regression with data from LVM, and a PDE-constrained problem.
arXiv Detail & Related papers (2023-05-28T06:33:37Z) - Over-Parameterization Exponentially Slows Down Gradient Descent for
Learning a Single Neuron [49.45105570960104]
We prove the global convergence of randomly gradient descent with a $Oleft(T-3right)$ rate.
These two bounds jointly give an exact characterization of the convergence rate.
We show this potential function converges slowly, which implies the slow convergence rate of the loss function.
arXiv Detail & Related papers (2023-02-20T15:33:26Z) - Second-order optimization with lazy Hessians [55.51077907483634]
We analyze Newton's lazy Hessian updates for solving general possibly non-linear optimization problems.
We reuse a previously seen Hessian iteration while computing new gradients at each step of the method.
arXiv Detail & Related papers (2022-12-01T18:58:26Z) - A Boosting Approach to Reinforcement Learning [59.46285581748018]
We study efficient algorithms for reinforcement learning in decision processes whose complexity is independent of the number of states.
We give an efficient algorithm that is capable of improving the accuracy of such weak learning methods.
arXiv Detail & Related papers (2021-08-22T16:00:45Z) - Implicit differentiation for fast hyperparameter selection in non-smooth
convex learning [87.60600646105696]
We study first-order methods when the inner optimization problem is convex but non-smooth.
We show that the forward-mode differentiation of proximal gradient descent and proximal coordinate descent yield sequences of Jacobians converging toward the exact Jacobian.
arXiv Detail & Related papers (2021-05-04T17:31:28Z) - Optimizing generalization on the train set: a novel gradient-based
framework to train parameters and hyperparameters simultaneously [0.0]
Generalization is a central problem in Machine Learning.
We present a novel approach based on a new measure of risk that allows us to develop novel fully automatic procedures for generalization.
arXiv Detail & Related papers (2020-06-11T18:04:36Z) - Gradient-Free Methods for Saddle-Point Problem [125.99533416395765]
We generalize the approach Gasnikov et. al., 2017, which allows to solve (stochastic) convex optimization problems with an inexact gradient-free oracle.
Our approach reduces $fracnlog n$ times the required number of oracle calls.
In the second part of the paper, we analyze the case when such an assumption cannot be made, we propose a general approach on how to modernize the method to solve this problem.
arXiv Detail & Related papers (2020-05-12T16:44:27Z) - First Order Methods take Exponential Time to Converge to Global
Minimizers of Non-Convex Functions [28.776950569604026]
In this work, we provide bounds on the fundamental hardness of a non convex function.
We show that the parameter estimation problem is equivalent to the problem of family identification.
arXiv Detail & Related papers (2020-02-28T18:28:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.