Randomized Forward Mode of Automatic Differentiation For Optimization
Algorithms
- URL: http://arxiv.org/abs/2310.14168v3
- Date: Thu, 1 Feb 2024 20:33:02 GMT
- Title: Randomized Forward Mode of Automatic Differentiation For Optimization
Algorithms
- Authors: Khemraj Shukla and Yeonjong Shin
- Abstract summary: We present a randomized forward mode gradient (RFG) as an alternative to backpropagation.
The probability distribution of the random vector determines the statistical properties of RFG.
By replacing gradient with RFG, a class of RFG-based optimization algorithms is obtained.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a randomized forward mode gradient (RFG) as an alternative to
backpropagation. RFG is a random estimator for the gradient that is constructed
based on the directional derivative along a random vector. The forward mode
automatic differentiation (AD) provides an efficient computation of RFG. The
probability distribution of the random vector determines the statistical
properties of RFG. Through the second moment analysis, we found that the
distribution with the smallest kurtosis yields the smallest expected relative
squared error. By replacing gradient with RFG, a class of RFG-based
optimization algorithms is obtained. By focusing on gradient descent (GD) and
Polyak's heavy ball (PHB) methods, we present a convergence analysis of
RFG-based optimization algorithms for quadratic functions. Computational
experiments are presented to demonstrate the performance of the proposed
algorithms and verify the theoretical findings.
Related papers
- Projected Forward Gradient-Guided Frank-Wolfe Algorithm via Variance Reduction [0.0]
This paper aims to enhance the use of the Frank-Wolfe (FW) algorithm for training deep neural networks.
Similar to any-based algorithm, FW suffers from high computational memory costs when computing for DNNs.
arXiv Detail & Related papers (2024-03-19T07:25:36Z) - Sliced Wasserstein with Random-Path Projecting Directions [49.802024788196434]
We propose an optimization-free slicing distribution that provides a fast sampling for the Monte Carlo estimation of expectation.
We derive the random-path slicing distribution (RPSD) and two variants of sliced Wasserstein, i.e., the Random-Path Projection Sliced Wasserstein (RPSW) and the Importance Weighted Random-Path Projection Sliced Wasserstein (IWRPSW)
arXiv Detail & Related papers (2024-01-29T04:59:30Z) - Stochastic Optimization for Non-convex Problem with Inexact Hessian
Matrix, Gradient, and Function [99.31457740916815]
Trust-region (TR) and adaptive regularization using cubics have proven to have some very appealing theoretical properties.
We show that TR and ARC methods can simultaneously provide inexact computations of the Hessian, gradient, and function values.
arXiv Detail & Related papers (2023-10-18T10:29:58Z) - Finite-Sample Bounds for Adaptive Inverse Reinforcement Learning using Passive Langevin Dynamics [13.440621354486906]
This paper provides a finite-sample analysis of a passive gradient Langevin dynamics (PSGLD) algorithm.
Adaptive IRL aims to estimate the cost function of a forward learner performing a gradient algorithm.
arXiv Detail & Related papers (2023-04-18T16:39:51Z) - The Novel Adaptive Fractional Order Gradient Decent Algorithms Design
via Robust Control [5.5491171448159715]
The vanilla fractional order descent may oscillatively converge to a region around the global minimum instead of converging to the exact minimum point, or even diverge, in the case where the objective function is strongly convex.
To address this problem, a novel adaptive fractional order descent (AFDOG) and a novel fractional descent (AFOAGD) method are proposed in this paper.
arXiv Detail & Related papers (2023-03-08T02:03:30Z) - Fast Computation of Optimal Transport via Entropy-Regularized Extragradient Methods [75.34939761152587]
Efficient computation of the optimal transport distance between two distributions serves as an algorithm that empowers various applications.
This paper develops a scalable first-order optimization-based method that computes optimal transport to within $varepsilon$ additive accuracy.
arXiv Detail & Related papers (2023-01-30T15:46:39Z) - Sparse high-dimensional linear regression with a partitioned empirical
Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression.
Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates.
The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z) - Random-reshuffled SARAH does not need a full gradient computations [61.85897464405715]
The StochAstic Recursive grAdientritHm (SARAH) algorithm is a variance reduced variant of the Gradient Descent (SGD) algorithm.
In this paper, we remove the necessity of a full gradient.
The aggregated gradients serve as an estimate of a full gradient in the SARAH algorithm.
arXiv Detail & Related papers (2021-11-26T06:00:44Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - SGB: Stochastic Gradient Bound Method for Optimizing Partition Functions [15.33098084159285]
This paper addresses the problem of optimizing partition functions in a learning setting.
We propose a variant of the bound majorization algorithm that relies on upper-bounding the partition function with a quadratic surrogate.
arXiv Detail & Related papers (2020-11-03T04:42:51Z) - Non-asymptotic bounds for stochastic optimization with biased noisy
gradient oracles [8.655294504286635]
We introduce biased gradient oracles to capture a setting where the function measurements have an estimation error.
Our proposed oracles are in practical contexts, for instance, risk measure estimation from a batch of independent and identically distributed simulation.
arXiv Detail & Related papers (2020-02-26T12:53:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.