AdaSub: Stochastic Optimization Using Second-Order Information in
Low-Dimensional Subspaces
- URL: http://arxiv.org/abs/2310.20060v2
- Date: Mon, 6 Nov 2023 21:37:33 GMT
- Title: AdaSub: Stochastic Optimization Using Second-Order Information in
Low-Dimensional Subspaces
- Authors: Jo\~ao Victor Galv\~ao da Mata and Martin S. Andersen
- Abstract summary: We introduce AdaSub, a search algorithm that computes a search direction based on second-order information in a low-dimensional subspace.
Compared to first-order methods, second-order methods exhibit better convergence characteristics, but the need to compute the Hessian matrix at each iteration results in excessive computational expenses.
Our preliminary numerical results demonstrate that AdaSub surpasses popular iterations in terms of time and number of iterations required to reach a given accuracy.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce AdaSub, a stochastic optimization algorithm that computes a
search direction based on second-order information in a low-dimensional
subspace that is defined adaptively based on available current and past
information. Compared to first-order methods, second-order methods exhibit
better convergence characteristics, but the need to compute the Hessian matrix
at each iteration results in excessive computational expenses, making them
impractical. To address this issue, our approach enables the management of
computational expenses and algorithm efficiency by enabling the selection of
the subspace dimension for the search. Our code is freely available on GitHub,
and our preliminary numerical results demonstrate that AdaSub surpasses popular
stochastic optimizers in terms of time and number of iterations required to
reach a given accuracy.
Related papers
- Adaptive and Optimal Second-order Optimistic Methods for Minimax Optimization [32.939120407900035]
Our algorithms feature a simple update rule that requires solving only one linear system per iteration.
We also evaluate the practical performance of our algorithm by comparing it to existing second-order algorithms for minimax optimization.
arXiv Detail & Related papers (2024-06-04T06:56:41Z) - Stochastic Optimization for Non-convex Problem with Inexact Hessian
Matrix, Gradient, and Function [99.31457740916815]
Trust-region (TR) and adaptive regularization using cubics have proven to have some very appealing theoretical properties.
We show that TR and ARC methods can simultaneously provide inexact computations of the Hessian, gradient, and function values.
arXiv Detail & Related papers (2023-10-18T10:29:58Z) - Learning the Positions in CountSketch [49.57951567374372]
We consider sketching algorithms which first compress data by multiplication with a random sketch matrix, and then apply the sketch to quickly solve an optimization problem.
In this work, we propose the first learning-based algorithms that also optimize the locations of the non-zero entries.
arXiv Detail & Related papers (2023-06-11T07:28:35Z) - Fast Computation of Optimal Transport via Entropy-Regularized Extragradient Methods [75.34939761152587]
Efficient computation of the optimal transport distance between two distributions serves as an algorithm that empowers various applications.
This paper develops a scalable first-order optimization-based method that computes optimal transport to within $varepsilon$ additive accuracy.
arXiv Detail & Related papers (2023-01-30T15:46:39Z) - A Novel Fast Exact Subproblem Solver for Stochastic Quasi-Newton Cubic
Regularized Optimization [0.38233569758620045]
We describe an Adaptive Regularization using cubics (ARC) methods for large-scale unconstrained optimization.
We find that our new approach, ARCLQN, compares to moderns with minimal tuning, a common pain-point for second order methods.
arXiv Detail & Related papers (2022-04-19T20:25:29Z) - Adaptive First- and Second-Order Algorithms for Large-Scale Machine
Learning [3.0204520109309843]
We consider first- and second-order techniques to address continuous optimization problems in machine learning.
In the first-order case, we propose a framework of transition from semi-deterministic to quadratic regularization methods.
In the second-order case, we propose a novel first-order algorithm with adaptive sampling and adaptive step size.
arXiv Detail & Related papers (2021-11-29T18:10:00Z) - Efficient Matrix-Free Approximations of Second-Order Information, with
Applications to Pruning and Optimization [16.96639526117016]
We investigate matrix-free, linear-time approaches for estimating Inverse-Hessian Vector Products (IHVPs)
These algorithms yield state-of-the-art results for network pruning and optimization with lower computational overhead relative to existing second-order methods.
arXiv Detail & Related papers (2021-07-07T17:01:34Z) - Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex
Decentralized Optimization Over Time-Varying Networks [79.16773494166644]
We consider the task of minimizing the sum of smooth and strongly convex functions stored in a decentralized manner across the nodes of a communication network.
We design two optimal algorithms that attain these lower bounds.
We corroborate the theoretical efficiency of these algorithms by performing an experimental comparison with existing state-of-the-art methods.
arXiv Detail & Related papers (2021-06-08T15:54:44Z) - Effective Dimension Adaptive Sketching Methods for Faster Regularized
Least-Squares Optimization [56.05635751529922]
We propose a new randomized algorithm for solving L2-regularized least-squares problems based on sketching.
We consider two of the most popular random embeddings, namely, Gaussian embeddings and the Subsampled Randomized Hadamard Transform (SRHT)
arXiv Detail & Related papers (2020-06-10T15:00:09Z) - Private Stochastic Convex Optimization: Optimal Rates in Linear Time [74.47681868973598]
We study the problem of minimizing the population loss given i.i.d. samples from a distribution over convex loss functions.
A recent work of Bassily et al. has established the optimal bound on the excess population loss achievable given $n$ samples.
We describe two new techniques for deriving convex optimization algorithms both achieving the optimal bound on excess loss and using $O(minn, n2/d)$ gradient computations.
arXiv Detail & Related papers (2020-05-10T19:52:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.