Related papers: Average-case Acceleration Through Spectral Density Estimation

Average-case Acceleration Through Spectral Density Estimation

URL: http://arxiv.org/abs/2002.04756v6
Date: Wed, 15 Dec 2021 13:18:31 GMT
Title: Average-case Acceleration Through Spectral Density Estimation
Authors: Fabian Pedregosa, Damien Scieur
Abstract summary: We develop a framework for the average-case analysis of random quadratic problems. We derive algorithms that are optimal under this analysis. We develop explicit algorithms for the uniform, Marchenko-Pastur, and exponential distributions.
Score: 35.01931431231649
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We develop a framework for the average-case analysis of random quadratic problems and derive algorithms that are optimal under this analysis. This yields a new class of methods that achieve acceleration given a model of the Hessian's eigenvalue distribution. We develop explicit algorithms for the uniform, Marchenko-Pastur, and exponential distributions. These methods are momentum-based algorithms, whose hyper-parameters can be estimated without knowledge of the Hessian's smallest singular value, in contrast with classical accelerated methods like Nesterov acceleration and Polyak momentum. Through empirical benchmarks on quadratic and logistic regression problems, we identify regimes in which the the proposed methods improve over classical (worst-case) accelerated methods.

Related papers

An Efficient On-Policy Deep Learning Framework for Stochastic Optimal Control [14.832859803172846]
We present a novel on-policy algorithm for solving optimal control (SOC) problems.<n>By leveraging the Girsanov theorem, our method directly computes on-policy gradients of the SOC objective without expensive backpropagation through differential equations or adjoint problem solutions.<n> Experimental results demonstrate substantial improvements in both computational speed and memory efficiency compared to existing approaches.
arXiv Detail & Related papers (2024-10-07T16:16:53Z)
The Stochastic Conjugate Subgradient Algorithm For Kernel Support Vector Machines [1.738375118265695]
This paper proposes an innovative method specifically designed for kernel support vector machines (SVMs) It not only achieves faster iteration per iteration but also exhibits enhanced convergence when compared to conventional SFO techniques. Our experimental results demonstrate that the proposed algorithm not only maintains but potentially exceeds the scalability of SFO methods.
arXiv Detail & Related papers (2024-07-30T17:03:19Z)
A KL-based Analysis Framework with Applications to Non-Descent Optimization Methods [5.779838187603272]
We propose a novel framework for non-descent-type optimization methodologies in non-descent-type scenarios based on the Kurdyka-Lojasiewicz property.
arXiv Detail & Related papers (2024-06-04T12:49:46Z)
Robust empirical risk minimization via Newton's method [9.797319790710711]
A new variant of Newton's method for empirical risk minimization is studied. The gradient and Hessian of the objective function are replaced by robust estimators. An algorithm for obtaining robust Newton directions based on the conjugate gradient method is also proposed.
arXiv Detail & Related papers (2023-01-30T18:54:54Z)
Amortized Implicit Differentiation for Stochastic Bilevel Optimization [53.12363770169761]
We study a class of algorithms for solving bilevel optimization problems in both deterministic and deterministic settings. We exploit a warm-start strategy to amortize the estimation of the exact gradient. By using this framework, our analysis shows these algorithms to match the computational complexity of methods that have access to an unbiased estimate of the gradient.
arXiv Detail & Related papers (2021-11-29T15:10:09Z)
A Discrete Variational Derivation of Accelerated Methods in Optimization [68.8204255655161]
We introduce variational which allow us to derive different methods for optimization. We derive two families of optimization methods in one-to-one correspondence. The preservation of symplecticity of autonomous systems occurs here solely on the fibers.
arXiv Detail & Related papers (2021-06-04T20:21:53Z)
Acceleration Methods [57.202881673406324]
We first use quadratic optimization problems to introduce two key families of acceleration methods. We discuss momentum methods in detail, starting with the seminal work of Nesterov. We conclude by discussing restart schemes, a set of simple techniques for reaching nearly optimal convergence rates.
arXiv Detail & Related papers (2021-01-23T17:58:25Z)
Differentially Private Accelerated Optimization Algorithms [0.7874708385247353]
We present two classes of differentially private optimization algorithms. The first algorithm is inspired by Polyak's heavy ball method. The second class of algorithms are based on Nesterov's accelerated gradient method.
arXiv Detail & Related papers (2020-08-05T08:23:01Z)
IDEAL: Inexact DEcentralized Accelerated Augmented Lagrangian Method [64.15649345392822]
We introduce a framework for designing primal methods under the decentralized optimization setting where local functions are smooth and strongly convex. Our approach consists of approximately solving a sequence of sub-problems induced by the accelerated augmented Lagrangian method. When coupled with accelerated gradient descent, our framework yields a novel primal algorithm whose convergence rate is optimal and matched by recently derived lower bounds.
arXiv Detail & Related papers (2020-06-11T18:49:06Z)
Interpolation Technique to Speed Up Gradients Propagation in Neural ODEs [71.26657499537366]
We propose a simple literature-based method for the efficient approximation of gradients in neural ODE models. We compare it with the reverse dynamic method to train neural ODEs on classification, density estimation, and inference approximation tasks.
arXiv Detail & Related papers (2020-03-11T13:15:57Z)
Active Model Estimation in Markov Decision Processes [108.46146218973189]
We study the problem of efficient exploration in order to learn an accurate model of an environment, modeled as a Markov decision process (MDP) We show that our Markov-based algorithm outperforms both our original algorithm and the maximum entropy algorithm in the small sample regime.
arXiv Detail & Related papers (2020-03-06T16:17:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.