Related papers: Randomized Forward Mode of Automatic Differentiation For Optimization Algorithms

Randomized Forward Mode of Automatic Differentiation For Optimization Algorithms

URL: http://arxiv.org/abs/2310.14168v3
Date: Thu, 1 Feb 2024 20:33:02 GMT
Title: Randomized Forward Mode of Automatic Differentiation For Optimization Algorithms
Authors: Khemraj Shukla and Yeonjong Shin
Abstract summary: We present a randomized forward mode gradient (RFG) as an alternative to backpropagation. The probability distribution of the random vector determines the statistical properties of RFG. By replacing gradient with RFG, a class of RFG-based optimization algorithms is obtained.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present a randomized forward mode gradient (RFG) as an alternative to backpropagation. RFG is a random estimator for the gradient that is constructed based on the directional derivative along a random vector. The forward mode automatic differentiation (AD) provides an efficient computation of RFG. The probability distribution of the random vector determines the statistical properties of RFG. Through the second moment analysis, we found that the distribution with the smallest kurtosis yields the smallest expected relative squared error. By replacing gradient with RFG, a class of RFG-based optimization algorithms is obtained. By focusing on gradient descent (GD) and Polyak's heavy ball (PHB) methods, we present a convergence analysis of RFG-based optimization algorithms for quadratic functions. Computational experiments are presented to demonstrate the performance of the proposed algorithms and verify the theoretical findings.

Related papers

Projected Forward Gradient-Guided Frank-Wolfe Algorithm via Variance Reduction [0.0]
This paper aims to enhance the use of the Frank-Wolfe (FW) algorithm for training deep neural networks. Similar to any-based algorithm, FW suffers from high computational memory costs when computing for DNNs.
arXiv Detail & Related papers (2024-03-19T07:25:36Z)
Sliced Wasserstein with Random-Path Projecting Directions [49.802024788196434]
We propose an optimization-free slicing distribution that provides a fast sampling for the Monte Carlo estimation of expectation. We derive the random-path slicing distribution (RPSD) and two variants of sliced Wasserstein, i.e., the Random-Path Projection Sliced Wasserstein (RPSW) and the Importance Weighted Random-Path Projection Sliced Wasserstein (IWRPSW)
arXiv Detail & Related papers (2024-01-29T04:59:30Z)
Stochastic Optimization for Non-convex Problem with Inexact Hessian Matrix, Gradient, and Function [99.31457740916815]
Trust-region (TR) and adaptive regularization using cubics have proven to have some very appealing theoretical properties. We show that TR and ARC methods can simultaneously provide inexact computations of the Hessian, gradient, and function values.
arXiv Detail & Related papers (2023-10-18T10:29:58Z)
Finite-Sample Bounds for Adaptive Inverse Reinforcement Learning using Passive Langevin Dynamics [15.878313629774269]
This paper provides a finite-sample analysis of a passive gradient Langevin dynamics (PSGLD) algorithm designed to achieve adaptive inverse reinforcement learning (IRL) We obtain finite-sample bounds on the 2-Wasserstein distance between the estimates generated by the PSGLD algorithm and the cost function. This work extends a line of research in analysis of passive adaptive gradient algorithms to the finite-sample regime for Langevin dynamics.
arXiv Detail & Related papers (2023-04-18T16:39:51Z)
The Novel Adaptive Fractional Order Gradient Decent Algorithms Design via Robust Control [5.5491171448159715]
The vanilla fractional order descent may oscillatively converge to a region around the global minimum instead of converging to the exact minimum point, or even diverge, in the case where the objective function is strongly convex. To address this problem, a novel adaptive fractional order descent (AFDOG) and a novel fractional descent (AFOAGD) method are proposed in this paper.
arXiv Detail & Related papers (2023-03-08T02:03:30Z)
Fast Computation of Optimal Transport via Entropy-Regularized Extragradient Methods [75.34939761152587]
Efficient computation of the optimal transport distance between two distributions serves as an algorithm that empowers various applications. This paper develops a scalable first-order optimization-based method that computes optimal transport to within $varepsilon$ additive accuracy.
arXiv Detail & Related papers (2023-01-30T15:46:39Z)
Sparse high-dimensional linear regression with a partitioned empirical Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression. Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates. The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z)
Random-reshuffled SARAH does not need a full gradient computations [61.85897464405715]
The StochAstic Recursive grAdientritHm (SARAH) algorithm is a variance reduced variant of the Gradient Descent (SGD) algorithm. In this paper, we remove the necessity of a full gradient. The aggregated gradients serve as an estimate of a full gradient in the SARAH algorithm.
arXiv Detail & Related papers (2021-11-26T06:00:44Z)
Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information. We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z)
SGB: Stochastic Gradient Bound Method for Optimizing Partition Functions [15.33098084159285]
This paper addresses the problem of optimizing partition functions in a learning setting. We propose a variant of the bound majorization algorithm that relies on upper-bounding the partition function with a quadratic surrogate.
arXiv Detail & Related papers (2020-11-03T04:42:51Z)
Non-asymptotic bounds for stochastic optimization with biased noisy gradient oracles [8.655294504286635]
We introduce biased gradient oracles to capture a setting where the function measurements have an estimation error. Our proposed oracles are in practical contexts, for instance, risk measure estimation from a batch of independent and identically distributed simulation.
arXiv Detail & Related papers (2020-02-26T12:53:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.