Towards Sharp Stochastic Zeroth Order Hessian Estimators over Riemannian
Manifolds
- URL: http://arxiv.org/abs/2201.10780v1
- Date: Wed, 26 Jan 2022 07:09:25 GMT
- Title: Towards Sharp Stochastic Zeroth Order Hessian Estimators over Riemannian
Manifolds
- Authors: Tianyu Wang
- Abstract summary: We introduce new zeroth-order Hessian estimators using $O (1)$ function evaluations.
For a smooth real-valued function $f$ with Lipschitz Hessian, our estimator achieves a bias bound of order $ O left( L delta + gamma delta2 right) $.
- Score: 5.7569764948783355
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study Hessian estimators for real-valued functions defined over an
$n$-dimensional complete Riemannian manifold. We introduce new stochastic
zeroth-order Hessian estimators using $O (1)$ function evaluations. We show
that, for a smooth real-valued function $f$ with Lipschitz Hessian (with
respect to the Rimannian metric), our estimator achieves a bias bound of order
$ O \left( L_2 \delta + \gamma \delta^2 \right) $, where $ L_2 $ is the
Lipschitz constant for the Hessian, $ \gamma $ depends on both the Levi-Civita
connection and function $f$, and $\delta$ is the finite difference step size.
To the best of our knowledge, our results provide the first bias bound for
Hessian estimators that explicitly depends on the geometry of the underlying
Riemannian manifold. Perhaps more importantly, our bias bound does not increase
with dimension $n$. This improves best previously known bias bound for
$O(1)$-evaluation Hessian estimators, which increases quadratically with $n$.
We also study downstream computations based on our Hessian estimators. The
supremacy of our method is evidenced by empirical evaluations.
Related papers
- Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms [50.15964512954274]
We study the problem of residual error estimation for matrix and vector norms using a linear sketch.
We demonstrate that this gives a substantial advantage empirically, for roughly the same sketch size and accuracy as in previous work.
We also show an $Omega(k2/pn1-2/p)$ lower bound for the sparse recovery problem, which is tight up to a $mathrmpoly(log n)$ factor.
arXiv Detail & Related papers (2024-08-16T02:33:07Z) - Estimation and Inference in Distributional Reinforcement Learning [28.253677740976197]
We show that a dataset of size $widetilde Oleft(frac|mathcalS||mathcalA|epsilon2 (1-gamma)4right)$ suffices to ensure the Kolmogorov metric and total variation metric between $hatetapi$ and $etapi$ is below $epsilon$ with high probability.
Our findings give rise to a unified approach to statistical inference of a wide class of statistical functionals of $etapi$.
arXiv Detail & Related papers (2023-09-29T14:14:53Z) - $L^1$ Estimation: On the Optimality of Linear Estimators [64.76492306585168]
This work shows that the only prior distribution on $X$ that induces linearity in the conditional median is Gaussian.
In particular, it is demonstrated that if the conditional distribution $P_X|Y=y$ is symmetric for all $y$, then $X$ must follow a Gaussian distribution.
arXiv Detail & Related papers (2023-09-17T01:45:13Z) - Accelerated Methods for Riemannian Min-Max Optimization Ensuring Bounded
Geometric Penalties [21.141544548229774]
We study the form $min_x max_y f(x, y) where $mathcalN$ are Hadamard.
We show global interest accelerated by reducing gradient convergence constants.
arXiv Detail & Related papers (2023-05-25T15:43:07Z) - Estimating the minimizer and the minimum value of a regression function
under passive design [72.85024381807466]
We propose a new method for estimating the minimizer $boldsymbolx*$ and the minimum value $f*$ of a smooth and strongly convex regression function $f$.
We derive non-asymptotic upper bounds for the quadratic risk and optimization error of $boldsymbolz_n$, and for the risk of estimating $f*$.
arXiv Detail & Related papers (2022-11-29T18:38:40Z) - Stochastic Zeroth Order Gradient and Hessian Estimators: Variance
Reduction and Refined Bias Bounds [6.137707924685666]
We study zeroth order and Hessian estimators for real-valued functions in $mathbbRn$.
We show that, via taking finite difference along random directions, the variance of gradient finite difference estimators can be significantly reduced.
arXiv Detail & Related papers (2022-05-29T18:53:24Z) - Storage and retrieval of von Neumann measurements [0.0]
This work examines the problem of learning an unknown von Neumann measurement of dimension $d$ from a finite number of copies.
Our main goal is to estimate the behavior of the maximum value of the average fidelity function $F_d$ for a general $N rightarrow 1$ learning scheme.
arXiv Detail & Related papers (2022-04-06T18:19:04Z) - Sparse sketches with small inversion bias [79.77110958547695]
Inversion bias arises when averaging estimates of quantities that depend on the inverse covariance.
We develop a framework for analyzing inversion bias, based on our proposed concept of an $(epsilon,delta)$-unbiased estimator for random matrices.
We show that when the sketching matrix $S$ is dense and has i.i.d. sub-gaussian entries, the estimator $(epsilon,delta)$-unbiased for $(Atop A)-1$ with a sketch of size $m=O(d+sqrt d/
arXiv Detail & Related papers (2020-11-21T01:33:15Z) - A One-bit, Comparison-Based Gradient Estimator [29.600975900977343]
We show how one can use tools from one-bit compressed sensing to construct a robust and reliable estimator of the normalized gradient.
We propose an algorithm, coined SCOBO, that uses this estimator within a gradient descent scheme.
arXiv Detail & Related papers (2020-10-06T05:01:38Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z) - Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.