Stochastic Zeroth Order Gradient and Hessian Estimators: Variance
Reduction and Refined Bias Bounds
- URL: http://arxiv.org/abs/2205.14737v3
- Date: Thu, 30 Mar 2023 13:46:04 GMT
- Title: Stochastic Zeroth Order Gradient and Hessian Estimators: Variance
Reduction and Refined Bias Bounds
- Authors: Yasong Feng, Tianyu Wang
- Abstract summary: We study zeroth order and Hessian estimators for real-valued functions in $mathbbRn$.
We show that, via taking finite difference along random directions, the variance of gradient finite difference estimators can be significantly reduced.
- Score: 6.137707924685666
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study stochastic zeroth order gradient and Hessian estimators for
real-valued functions in $\mathbb{R}^n$. We show that, via taking finite
difference along random orthogonal directions, the variance of the stochastic
finite difference estimators can be significantly reduced. In particular, we
design estimators for smooth functions such that, if one uses $ \Theta \left( k
\right) $ random directions sampled from the Stiefel's manifold $ \text{St}
(n,k) $ and finite-difference granularity $\delta$, the variance of the
gradient estimator is bounded by $ \mathcal{O} \left( \left( \frac{n}{k} - 1
\right) + \left( \frac{n^2}{k} - n \right) \delta^2 + \frac{ n^2 \delta^4 }{ k
} \right) $, and the variance of the Hessian estimator is bounded by
$\mathcal{O} \left( \left( \frac{n^2}{k^2} - 1 \right) + \left( \frac{n^4}{k^2}
- n^2 \right) \delta^2 + \frac{n^4 \delta^4 }{k^2} \right) $. When $k = n$, the
variances become negligibly small. In addition, we provide improved bias bounds
for the estimators. The bias of both gradient and Hessian estimators for smooth
function $f$ is of order $\mathcal{O} \left( \delta^2 \Gamma \right)$, where
$\delta$ is the finite-difference granularity, and $ \Gamma $ depends on high
order derivatives of $f$. Our results are evidenced by empirical observations.
Related papers
- Efficient Continual Finite-Sum Minimization [52.5238287567572]
We propose a key twist into the finite-sum minimization, dubbed as continual finite-sum minimization.
Our approach significantly improves upon the $mathcalO(n/epsilon)$ FOs that $mathrmStochasticGradientDescent$ requires.
We also prove that there is no natural first-order method with $mathcalOleft(n/epsilonalpharight)$ complexity gradient for $alpha 1/4$, establishing that the first-order complexity of our method is nearly tight.
arXiv Detail & Related papers (2024-06-07T08:26:31Z) - A Unified Framework for Uniform Signal Recovery in Nonlinear Generative
Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously.
Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples.
We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z) - Faster Gradient-Free Algorithms for Nonsmooth Nonconvex Stochastic Optimization [20.54801745090522]
We consider the problem of the form $min_x in mathbbRd f(x) triangleq mathbbE_xi [Fxi]$inf(x)$ Lipschitz.
The recently proposed gradient-free method requires at most $mathcalO( L4 d3/2 epsilon-4 + Goldstein L d3/2 delta-1 epsilon-4)$ zeroth-order complexity
arXiv Detail & Related papers (2023-01-16T13:33:37Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Structure Learning in Graphical Models from Indirect Observations [17.521712510832558]
This paper considers learning of the graphical structure of a $p$-dimensional random vector $X in Rp$ using both parametric and non-parametric methods.
Under mild conditions, we show that our graph-structure estimator can obtain the correct structure.
arXiv Detail & Related papers (2022-05-06T19:24:44Z) - Towards Sharp Stochastic Zeroth Order Hessian Estimators over Riemannian
Manifolds [5.7569764948783355]
We introduce new zeroth-order Hessian estimators using $O (1)$ function evaluations.
For a smooth real-valued function $f$ with Lipschitz Hessian, our estimator achieves a bias bound of order $ O left( L delta + gamma delta2 right) $.
arXiv Detail & Related papers (2022-01-26T07:09:25Z) - Optimal Mean Estimation without a Variance [103.26777953032537]
We study the problem of heavy-tailed mean estimation in settings where the variance of the data-generating distribution does not exist.
We design an estimator which attains the smallest possible confidence interval as a function of $n,d,delta$.
arXiv Detail & Related papers (2020-11-24T22:39:21Z) - Sparse sketches with small inversion bias [79.77110958547695]
Inversion bias arises when averaging estimates of quantities that depend on the inverse covariance.
We develop a framework for analyzing inversion bias, based on our proposed concept of an $(epsilon,delta)$-unbiased estimator for random matrices.
We show that when the sketching matrix $S$ is dense and has i.i.d. sub-gaussian entries, the estimator $(epsilon,delta)$-unbiased for $(Atop A)-1$ with a sketch of size $m=O(d+sqrt d/
arXiv Detail & Related papers (2020-11-21T01:33:15Z) - Multivariate mean estimation with direction-dependent accuracy [8.147652597876862]
We consider the problem of estimating the mean of a random vector based on $N$ independent, identically distributed observations.
We prove an estimator that has a near-optimal error in all directions in which the variance of the one dimensional marginal of the random vector is not too small.
arXiv Detail & Related papers (2020-10-22T17:52:45Z) - GO Hessian for Expectation-Based Objectives [73.06986780804269]
GO gradient was proposed recently for expectation-based objectives $mathbbE_q_boldsymbolboldsymbolgamma(boldsymboly) [f(boldsymboly)]$.
Based on the GO gradient, we present for $mathbbE_q_boldsymbolboldsymbolgamma(boldsymboly) [f(boldsymboly)]$ an unbiased low-variance Hessian estimator, named GO Hessian.
arXiv Detail & Related papers (2020-06-16T02:20:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.