Related papers: AdaSub: Stochastic Optimization Using Second-Order Information in Low-Dimensional Subspaces

AdaSub: Stochastic Optimization Using Second-Order Information in Low-Dimensional Subspaces

URL: http://arxiv.org/abs/2310.20060v2
Date: Mon, 6 Nov 2023 21:37:33 GMT
Title: AdaSub: Stochastic Optimization Using Second-Order Information in Low-Dimensional Subspaces
Authors: Jo\~ao Victor Galv\~ao da Mata and Martin S. Andersen
Abstract summary: We introduce AdaSub, a search algorithm that computes a search direction based on second-order information in a low-dimensional subspace. Compared to first-order methods, second-order methods exhibit better convergence characteristics, but the need to compute the Hessian matrix at each iteration results in excessive computational expenses. Our preliminary numerical results demonstrate that AdaSub surpasses popular iterations in terms of time and number of iterations required to reach a given accuracy.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce AdaSub, a stochastic optimization algorithm that computes a search direction based on second-order information in a low-dimensional subspace that is defined adaptively based on available current and past information. Compared to first-order methods, second-order methods exhibit better convergence characteristics, but the need to compute the Hessian matrix at each iteration results in excessive computational expenses, making them impractical. To address this issue, our approach enables the management of computational expenses and algorithm efficiency by enabling the selection of the subspace dimension for the search. Our code is freely available on GitHub, and our preliminary numerical results demonstrate that AdaSub surpasses popular stochastic optimizers in terms of time and number of iterations required to reach a given accuracy.

Related papers

Subspace-based Approximate Hessian Method for Zeroth-Order Optimization [22.43620167341874]
Zeroth-order optimization addresses problems where information is inaccessible or impractical to compute.<n>We present the subspace-based approximate Hessian (ZO-SAH) method, a zeroth-order optimization algorithm that mitigates these costs.<n>Experiments on eight datasets including logistic regression and deep neural network training tasks, demonstrate that ZO-SAH achieves significantly faster convergence than existing zeroth-order methods.
arXiv Detail & Related papers (2025-07-08T16:11:53Z)
Adaptive and Optimal Second-order Optimistic Methods for Minimax Optimization [32.939120407900035]
Our algorithms feature a simple update rule that requires solving only one linear system per iteration. We also evaluate the practical performance of our algorithm by comparing it to existing second-order algorithms for minimax optimization.
arXiv Detail & Related papers (2024-06-04T06:56:41Z)
Stochastic Optimization for Non-convex Problem with Inexact Hessian Matrix, Gradient, and Function [99.31457740916815]
Trust-region (TR) and adaptive regularization using cubics have proven to have some very appealing theoretical properties. We show that TR and ARC methods can simultaneously provide inexact computations of the Hessian, gradient, and function values.
arXiv Detail & Related papers (2023-10-18T10:29:58Z)
Learning the Positions in CountSketch [49.57951567374372]
We consider sketching algorithms which first compress data by multiplication with a random sketch matrix, and then apply the sketch to quickly solve an optimization problem. In this work, we propose the first learning-based algorithms that also optimize the locations of the non-zero entries.
arXiv Detail & Related papers (2023-06-11T07:28:35Z)
Fast Computation of Optimal Transport via Entropy-Regularized Extragradient Methods [75.34939761152587]
Efficient computation of the optimal transport distance between two distributions serves as an algorithm that empowers various applications. This paper develops a scalable first-order optimization-based method that computes optimal transport to within $varepsilon$ additive accuracy.
arXiv Detail & Related papers (2023-01-30T15:46:39Z)
A Novel Fast Exact Subproblem Solver for Stochastic Quasi-Newton Cubic Regularized Optimization [0.38233569758620045]
We describe an Adaptive Regularization using cubics (ARC) methods for large-scale unconstrained optimization. We find that our new approach, ARCLQN, compares to moderns with minimal tuning, a common pain-point for second order methods.
arXiv Detail & Related papers (2022-04-19T20:25:29Z)
Adaptive First- and Second-Order Algorithms for Large-Scale Machine Learning [3.0204520109309843]
We consider first- and second-order techniques to address continuous optimization problems in machine learning. In the first-order case, we propose a framework of transition from semi-deterministic to quadratic regularization methods. In the second-order case, we propose a novel first-order algorithm with adaptive sampling and adaptive step size.
arXiv Detail & Related papers (2021-11-29T18:10:00Z)
Efficient Matrix-Free Approximations of Second-Order Information, with Applications to Pruning and Optimization [16.96639526117016]
We investigate matrix-free, linear-time approaches for estimating Inverse-Hessian Vector Products (IHVPs) These algorithms yield state-of-the-art results for network pruning and optimization with lower computational overhead relative to existing second-order methods.
arXiv Detail & Related papers (2021-07-07T17:01:34Z)
Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex Decentralized Optimization Over Time-Varying Networks [79.16773494166644]
We consider the task of minimizing the sum of smooth and strongly convex functions stored in a decentralized manner across the nodes of a communication network. We design two optimal algorithms that attain these lower bounds. We corroborate the theoretical efficiency of these algorithms by performing an experimental comparison with existing state-of-the-art methods.
arXiv Detail & Related papers (2021-06-08T15:54:44Z)
Effective Dimension Adaptive Sketching Methods for Faster Regularized Least-Squares Optimization [56.05635751529922]
We propose a new randomized algorithm for solving L2-regularized least-squares problems based on sketching. We consider two of the most popular random embeddings, namely, Gaussian embeddings and the Subsampled Randomized Hadamard Transform (SRHT)
arXiv Detail & Related papers (2020-06-10T15:00:09Z)
Private Stochastic Convex Optimization: Optimal Rates in Linear Time [74.47681868973598]
We study the problem of minimizing the population loss given i.i.d. samples from a distribution over convex loss functions. A recent work of Bassily et al. has established the optimal bound on the excess population loss achievable given $n$ samples. We describe two new techniques for deriving convex optimization algorithms both achieving the optimal bound on excess loss and using $O(minn, n2/d)$ gradient computations.
arXiv Detail & Related papers (2020-05-10T19:52:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.