Information-Theoretic Lower Bounds for Zero-Order Stochastic Gradient
Estimation
- URL: http://arxiv.org/abs/2003.13881v2
- Date: Tue, 27 Oct 2020 16:00:16 GMT
- Title: Information-Theoretic Lower Bounds for Zero-Order Stochastic Gradient
Estimation
- Authors: Abdulrahman Alabdulkareem and Jean Honorio
- Abstract summary: We analyze the necessary number of samples to estimate the gradient of any multidimensional smooth function in a zero-order oracle model.
We show that the finite difference method for a bounded-variance oracle has $O(d4/3/sqrtT)$ for functions with zero third higher derivatives.
- Score: 28.776950569604026
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper we analyze the necessary number of samples to estimate the
gradient of any multidimensional smooth (possibly non-convex) function in a
zero-order stochastic oracle model. In this model, an estimator has access to
noisy values of the function, in order to produce the estimate of the gradient.
We also provide an analysis on the sufficient number of samples for the finite
difference method, a classical technique in numerical linear algebra. For $T$
samples and $d$ dimensions, our information-theoretic lower bound is
$\Omega(\sqrt{d/T})$. We show that the finite difference method for a
bounded-variance oracle has rate $O(d^{4/3}/\sqrt{T})$ for functions with zero
third and higher order derivatives. These rates are tight for Gaussian oracles.
Thus, the finite difference method is not minimax optimal, and therefore there
is space for the development of better gradient estimation methods.
Related papers
- Gradient Compressed Sensing: A Query-Efficient Gradient Estimator for High-Dimensional Zeroth-Order Optimization [48.84672493756553]
We propose a query-efficient and accurate estimator for gradients that uses only $Obig(slogfrac dsbig)$ queries per step.<n>Our proposed GraCe generalizes the Indyk--Price--Woodruff (IPW) algorithm in compressed sensing from linear measurements to nonlinear functions.
arXiv Detail & Related papers (2024-05-27T03:52:53Z) - A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z) - Kernel-based off-policy estimation without overlap: Instance optimality
beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data.
For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z) - Stochastic Inexact Augmented Lagrangian Method for Nonconvex Expectation
Constrained Optimization [88.0031283949404]
Many real-world problems have complicated non functional constraints and use a large number of data points.
Our proposed method outperforms an existing method with the previously best-known result.
arXiv Detail & Related papers (2022-12-19T14:48:54Z) - Sharper Rates and Flexible Framework for Nonconvex SGD with Client and
Data Sampling [64.31011847952006]
We revisit the problem of finding an approximately stationary point of the average of $n$ smooth and possibly non-color functions.
We generalize the $smallsfcolorgreen$ so that it can provably work with virtually any sampling mechanism.
We provide the most general and most accurate analysis of optimal bound in the smooth non-color regime.
arXiv Detail & Related papers (2022-06-05T21:32:33Z) - Generalization Bounds for Gradient Methods via Discrete and Continuous
Prior [8.76346911214414]
We show a new high probability generalization bound of order $O(frac1n + fracL2n2sum_t=1T(gamma_t/varepsilon_t)2)$ for gradient Langevin Dynamics (GLD)
We can also obtain new bounds for certain variants of SGD.
arXiv Detail & Related papers (2022-05-27T07:23:01Z) - Faster Convergence of Stochastic Gradient Langevin Dynamics for
Non-Log-Concave Sampling [110.88857917726276]
We provide a new convergence analysis of gradient Langevin dynamics (SGLD) for sampling from a class of distributions that can be non-log-concave.
At the core of our approach is a novel conductance analysis of SGLD using an auxiliary time-reversible Markov Chain.
arXiv Detail & Related papers (2020-10-19T15:23:18Z) - Gradient Free Minimax Optimization: Variance Reduction and Faster
Convergence [120.9336529957224]
In this paper, we denote the non-strongly setting on the magnitude of a gradient-free minimax optimization problem.
We show that a novel zeroth-order variance reduced descent algorithm achieves the best known query complexity.
arXiv Detail & Related papers (2020-06-16T17:55:46Z) - Tight Nonparametric Convergence Rates for Stochastic Gradient Descent
under the Noiseless Linear Model [0.0]
We analyze the convergence of single-pass, fixed step-size gradient descent on the least-square risk under this model.
As a special case, we analyze an online algorithm for estimating a real function on the unit interval from the noiseless observation of its value at randomly sampled points.
arXiv Detail & Related papers (2020-06-15T08:25:50Z) - Zeroth-Order Regularized Optimization (ZORO): Approximately Sparse
Gradients and Adaptive Sampling [29.600975900977343]
We propose a new $textbfZ$eroth-$textbfO$rder $textbfR$egularized $textbfO$ptimization method, dubbed ZORO.
When the underlying gradient is approximately sparse at an iterate, ZORO needs very few objective function evaluations to obtain a new iterate that decreases the objective function.
arXiv Detail & Related papers (2020-03-29T11:01:17Z) - A Sub-sampled Tensor Method for Non-convex Optimization [0.0]
We present a method that uses a fourth-order regularized model to find local minima of smooth and potentially objective functions with a finite-sum structure.
arXiv Detail & Related papers (2019-11-23T13:46:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.