Doubly Adaptive Scaled Algorithm for Machine Learning Using Second-Order
Information
- URL: http://arxiv.org/abs/2109.05198v1
- Date: Sat, 11 Sep 2021 06:39:50 GMT
- Title: Doubly Adaptive Scaled Algorithm for Machine Learning Using Second-Order
Information
- Authors: Majid Jahani, Sergey Rusakov, Zheng Shi, Peter Richt\'arik, Michael W.
Mahoney, Martin Tak\'a\v{c}
- Abstract summary: We present a novel adaptive optimization algorithm for large-scale machine learning problems.
Our method dynamically adapts the direction and step-size.
Our methodology does not require a tedious tuning rate tuning.
- Score: 37.70729542263343
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a novel adaptive optimization algorithm for large-scale machine
learning problems. Equipped with a low-cost estimate of local curvature and
Lipschitz smoothness, our method dynamically adapts the search direction and
step-size. The search direction contains gradient information preconditioned by
a well-scaled diagonal preconditioning matrix that captures the local curvature
information. Our methodology does not require the tedious task of learning rate
tuning, as the learning rate is updated automatically without adding an extra
hyperparameter. We provide convergence guarantees on a comprehensive collection
of optimization problems, including convex, strongly convex, and nonconvex
problems, in both deterministic and stochastic regimes. We also conduct an
extensive empirical evaluation on standard machine learning problems,
justifying our algorithm's versatility and demonstrating its strong performance
compared to other start-of-the-art first-order and second-order methods.
Related papers
- A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning.
These problems are often formalized as Bi-Level optimizations (BLO)
We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z) - Gradient-Variation Online Learning under Generalized Smoothness [56.38427425920781]
gradient-variation online learning aims to achieve regret guarantees that scale with variations in gradients of online functions.
Recent efforts in neural network optimization suggest a generalized smoothness condition, allowing smoothness to correlate with gradient norms.
We provide the applications for fast-rate convergence in games and extended adversarial optimization.
arXiv Detail & Related papers (2024-08-17T02:22:08Z) - Adaptive and Optimal Second-order Optimistic Methods for Minimax Optimization [32.939120407900035]
Our algorithms feature a simple update rule that requires solving only one linear system per iteration.
We also evaluate the practical performance of our algorithm by comparing it to existing second-order algorithms for minimax optimization.
arXiv Detail & Related papers (2024-06-04T06:56:41Z) - Adaptive Optimization Algorithms for Machine Learning [0.0]
Machine learning assumes a pivotal role in our data-driven world.
This thesis contributes novel insights, introduces new algorithms with improved convergence guarantees, and improves analyses of popular practical algorithms.
arXiv Detail & Related papers (2023-11-16T21:22:47Z) - Efficient Methods for Non-stationary Online Learning [67.3300478545554]
We present efficient methods for optimizing dynamic regret and adaptive regret, which reduce the number of projections per round from $mathcalO(log T)$ to $1$.
Our technique hinges on the reduction mechanism developed in parameter-free online learning and requires non-trivial twists on non-stationary online methods.
arXiv Detail & Related papers (2023-09-16T07:30:12Z) - Mechanic: A Learning Rate Tuner [52.4242550204696]
We introduce a technique for tuning the learning rate scale factor of any base optimization algorithm and schedule automatically, which we call textscmechanic.
We rigorously evaluate textscmechanic on a range of large scale deep learning tasks with varying batch sizes, schedules, and base optimization algorithms.
arXiv Detail & Related papers (2023-05-31T19:32:43Z) - Adaptive First- and Second-Order Algorithms for Large-Scale Machine
Learning [3.0204520109309843]
We consider first- and second-order techniques to address continuous optimization problems in machine learning.
In the first-order case, we propose a framework of transition from semi-deterministic to quadratic regularization methods.
In the second-order case, we propose a novel first-order algorithm with adaptive sampling and adaptive step size.
arXiv Detail & Related papers (2021-11-29T18:10:00Z) - A Probabilistically Motivated Learning Rate Adaptation for Stochastic
Optimization [20.77923050735746]
We provide a probabilistic motivation, in terms of Gaussian inference, for popular first-order methods.
The inference allows us to relate the learning rate to a dimensionless quantity that can be automatically adapted during training.
The resulting meta-algorithm is shown to adapt learning rates in a robust manner across a large range of initial values.
arXiv Detail & Related papers (2021-02-22T10:26:31Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z) - Effective Dimension Adaptive Sketching Methods for Faster Regularized
Least-Squares Optimization [56.05635751529922]
We propose a new randomized algorithm for solving L2-regularized least-squares problems based on sketching.
We consider two of the most popular random embeddings, namely, Gaussian embeddings and the Subsampled Randomized Hadamard Transform (SRHT)
arXiv Detail & Related papers (2020-06-10T15:00:09Z) - Adaptive First-and Zeroth-order Methods for Weakly Convex Stochastic
Optimization Problems [12.010310883787911]
We analyze a new family of adaptive subgradient methods for solving an important class of weakly convex (possibly nonsmooth) optimization problems.
Experimental results indicate how the proposed algorithms empirically outperform its zerothorder gradient descent and its design variant.
arXiv Detail & Related papers (2020-05-19T07:44:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.