A Novel Fast Exact Subproblem Solver for Stochastic Quasi-Newton Cubic
Regularized Optimization
- URL: http://arxiv.org/abs/2204.09116v1
- Date: Tue, 19 Apr 2022 20:25:29 GMT
- Title: A Novel Fast Exact Subproblem Solver for Stochastic Quasi-Newton Cubic
Regularized Optimization
- Authors: Jarad Forristal, Joshua Griffin, Wenwen Zhou, Seyedalireza Yektamaram
- Abstract summary: We describe an Adaptive Regularization using cubics (ARC) methods for large-scale unconstrained optimization.
We find that our new approach, ARCLQN, compares to moderns with minimal tuning, a common pain-point for second order methods.
- Score: 0.38233569758620045
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work we describe an Adaptive Regularization using Cubics (ARC) method
for large-scale nonconvex unconstrained optimization using Limited-memory
Quasi-Newton (LQN) matrices. ARC methods are a relatively new family of
optimization strategies that utilize a cubic-regularization (CR) term in place
of trust-regions and line-searches. LQN methods offer a large-scale alternative
to using explicit second-order information by taking identical inputs to those
used by popular first-order methods such as stochastic gradient descent (SGD).
Solving the CR subproblem exactly requires Newton's method, yet using
properties of the internal structure of LQN matrices, we are able to find exact
solutions to the CR subproblem in a matrix-free manner, providing large
speedups and scaling into modern size requirements. Additionally, we expand
upon previous ARC work and explicitly incorporate first-order updates into our
algorithm. We provide experimental results when the SR1 update is used, which
show substantial speed-ups and competitive performance compared to Adam and
other second order optimizers on deep neural networks (DNNs). We find that our
new approach, ARCLQN, compares to modern optimizers with minimal tuning, a
common pain-point for second order methods.
Related papers
- SGD with Partial Hessian for Deep Neural Networks Optimization [18.78728272603732]
We propose a compound, which is a combination of a second-order with a precise partial Hessian matrix for updating channel-wise parameters and the first-order gradient descent (SGD) algorithms for updating the other parameters.
Compared with first-orders, it adopts a certain amount of information from the Hessian matrix to assist optimization, while compared with the existing second-order generalizations, it keeps the good performance of first-order generalizations imprecise.
arXiv Detail & Related papers (2024-03-05T06:10:21Z) - l1-norm regularized l1-norm best-fit lines [3.0963566281269594]
We present a novel fitting procedure, utilizing simple ratios and sorting techniques.
The proposed algorithm demonstrates a worst-case time complexity of $O$(n2 m log n)$ and, in certain instances, achieves global optimality for the sparse subspace.
arXiv Detail & Related papers (2024-02-26T16:30:58Z) - AdaSub: Stochastic Optimization Using Second-Order Information in
Low-Dimensional Subspaces [0.0]
We introduce AdaSub, a search algorithm that computes a search direction based on second-order information in a low-dimensional subspace.
Compared to first-order methods, second-order methods exhibit better convergence characteristics, but the need to compute the Hessian matrix at each iteration results in excessive computational expenses.
Our preliminary numerical results demonstrate that AdaSub surpasses popular iterations in terms of time and number of iterations required to reach a given accuracy.
arXiv Detail & Related papers (2023-10-30T22:24:23Z) - Stochastic Optimization for Non-convex Problem with Inexact Hessian
Matrix, Gradient, and Function [99.31457740916815]
Trust-region (TR) and adaptive regularization using cubics have proven to have some very appealing theoretical properties.
We show that TR and ARC methods can simultaneously provide inexact computations of the Hessian, gradient, and function values.
arXiv Detail & Related papers (2023-10-18T10:29:58Z) - AskewSGD : An Annealed interval-constrained Optimisation method to train
Quantized Neural Networks [12.229154524476405]
We develop a new algorithm, Annealed Skewed SGD - AskewSGD - for training deep neural networks (DNNs) with quantized weights.
Unlike algorithms with active sets and feasible directions, AskewSGD avoids projections or optimization under the entire feasible set.
Experimental results show that the AskewSGD algorithm performs better than or on par with state of the art methods in classical benchmarks.
arXiv Detail & Related papers (2022-11-07T18:13:44Z) - Explicit Second-Order Min-Max Optimization Methods with Optimal Convergence Guarantee [86.05440220344755]
We propose and analyze inexact regularized Newton-type methods for finding a global saddle point of emphcon unconstrained min-max optimization problems.
We show that the proposed methods generate iterates that remain within a bounded set and that the iterations converge to an $epsilon$-saddle point within $O(epsilon-2/3)$ in terms of a restricted function.
arXiv Detail & Related papers (2022-10-23T21:24:37Z) - SCORE: Approximating Curvature Information under Self-Concordant
Regularization [0.0]
We propose a generalized Gauss-Newton with Self-Concordant Regularization (GGN-SCORE) algorithm that updates the minimization speed each time it receives a new input.
The proposed algorithm exploits the structure of the second-order information in the Hessian matrix, thereby reducing computational overhead.
arXiv Detail & Related papers (2021-12-14T13:03:04Z) - Solving weakly supervised regression problem using low-rank manifold
regularization [77.34726150561087]
We solve a weakly supervised regression problem.
Under "weakly" we understand that for some training points the labels are known, for some unknown, and for others uncertain due to the presence of random noise or other reasons such as lack of resources.
In the numerical section, we applied the suggested method to artificial and real datasets using Monte-Carlo modeling.
arXiv Detail & Related papers (2021-04-13T23:21:01Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z) - Effective Dimension Adaptive Sketching Methods for Faster Regularized
Least-Squares Optimization [56.05635751529922]
We propose a new randomized algorithm for solving L2-regularized least-squares problems based on sketching.
We consider two of the most popular random embeddings, namely, Gaussian embeddings and the Subsampled Randomized Hadamard Transform (SRHT)
arXiv Detail & Related papers (2020-06-10T15:00:09Z) - Optimal Randomized First-Order Methods for Least-Squares Problems [56.05635751529922]
This class of algorithms encompasses several randomized methods among the fastest solvers for least-squares problems.
We focus on two classical embeddings, namely, Gaussian projections and subsampled Hadamard transforms.
Our resulting algorithm yields the best complexity known for solving least-squares problems with no condition number dependence.
arXiv Detail & Related papers (2020-02-21T17:45:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.