Related papers: Information-Theoretic Lower Bounds for Zero-Order Stochastic Gradient Estimation

Information-Theoretic Lower Bounds for Zero-Order Stochastic Gradient Estimation

URL: http://arxiv.org/abs/2003.13881v2
Date: Tue, 27 Oct 2020 16:00:16 GMT
Title: Information-Theoretic Lower Bounds for Zero-Order Stochastic Gradient Estimation
Authors: Abdulrahman Alabdulkareem and Jean Honorio
Abstract summary: We analyze the necessary number of samples to estimate the gradient of any multidimensional smooth function in a zero-order oracle model. We show that the finite difference method for a bounded-variance oracle has $O(d4/3/sqrtT)$ for functions with zero third higher derivatives.
Score: 28.776950569604026
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper we analyze the necessary number of samples to estimate the gradient of any multidimensional smooth (possibly non-convex) function in a zero-order stochastic oracle model. In this model, an estimator has access to noisy values of the function, in order to produce the estimate of the gradient. We also provide an analysis on the sufficient number of samples for the finite difference method, a classical technique in numerical linear algebra. For $T$ samples and $d$ dimensions, our information-theoretic lower bound is $\Omega(\sqrt{d/T})$. We show that the finite difference method for a bounded-variance oracle has rate $O(d^{4/3}/\sqrt{T})$ for functions with zero third and higher order derivatives. These rates are tight for Gaussian oracles. Thus, the finite difference method is not minimax optimal, and therefore there is space for the development of better gradient estimation methods.

Related papers

Gradient Compressed Sensing: A Query-Efficient Gradient Estimator for High-Dimensional Zeroth-Order Optimization [48.84672493756553]
We propose a query-efficient and accurate estimator for gradients that uses only $Obig(slogfrac dsbig)$ queries per step.<n>Our proposed GraCe generalizes the Indyk--Price--Woodruff (IPW) algorithm in compressed sensing from linear measurements to nonlinear functions.
arXiv Detail & Related papers (2024-05-27T03:52:53Z)
A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks. We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks. Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z)
Kernel-based off-policy estimation without overlap: Instance optimality beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data. For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z)
Stochastic Inexact Augmented Lagrangian Method for Nonconvex Expectation Constrained Optimization [88.0031283949404]
Many real-world problems have complicated non functional constraints and use a large number of data points. Our proposed method outperforms an existing method with the previously best-known result.
arXiv Detail & Related papers (2022-12-19T14:48:54Z)
Sharper Rates and Flexible Framework for Nonconvex SGD with Client and Data Sampling [64.31011847952006]
We revisit the problem of finding an approximately stationary point of the average of $n$ smooth and possibly non-color functions. We generalize the $smallsfcolorgreen$ so that it can provably work with virtually any sampling mechanism. We provide the most general and most accurate analysis of optimal bound in the smooth non-color regime.
arXiv Detail & Related papers (2022-06-05T21:32:33Z)
Generalization Bounds for Gradient Methods via Discrete and Continuous Prior [8.76346911214414]
We show a new high probability generalization bound of order $O(frac1n + fracL2n2sum_t=1T(gamma_t/varepsilon_t)2)$ for gradient Langevin Dynamics (GLD) We can also obtain new bounds for certain variants of SGD.
arXiv Detail & Related papers (2022-05-27T07:23:01Z)
Faster Convergence of Stochastic Gradient Langevin Dynamics for Non-Log-Concave Sampling [110.88857917726276]
We provide a new convergence analysis of gradient Langevin dynamics (SGLD) for sampling from a class of distributions that can be non-log-concave. At the core of our approach is a novel conductance analysis of SGLD using an auxiliary time-reversible Markov Chain.
arXiv Detail & Related papers (2020-10-19T15:23:18Z)
Gradient Free Minimax Optimization: Variance Reduction and Faster Convergence [120.9336529957224]
In this paper, we denote the non-strongly setting on the magnitude of a gradient-free minimax optimization problem. We show that a novel zeroth-order variance reduced descent algorithm achieves the best known query complexity.
arXiv Detail & Related papers (2020-06-16T17:55:46Z)
Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model [0.0]
We analyze the convergence of single-pass, fixed step-size gradient descent on the least-square risk under this model. As a special case, we analyze an online algorithm for estimating a real function on the unit interval from the noiseless observation of its value at randomly sampled points.
arXiv Detail & Related papers (2020-06-15T08:25:50Z)
Zeroth-Order Regularized Optimization (ZORO): Approximately Sparse Gradients and Adaptive Sampling [29.600975900977343]
We propose a new $textbfZ$eroth-$textbfO$rder $textbfR$egularized $textbfO$ptimization method, dubbed ZORO. When the underlying gradient is approximately sparse at an iterate, ZORO needs very few objective function evaluations to obtain a new iterate that decreases the objective function.
arXiv Detail & Related papers (2020-03-29T11:01:17Z)
A Sub-sampled Tensor Method for Non-convex Optimization [0.0]
We present a method that uses a fourth-order regularized model to find local minima of smooth and potentially objective functions with a finite-sum structure.
arXiv Detail & Related papers (2019-11-23T13:46:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.