Finite-Sample Symmetric Mean Estimation with Fisher Information Rate
- URL: http://arxiv.org/abs/2306.16573v1
- Date: Wed, 28 Jun 2023 21:31:46 GMT
- Title: Finite-Sample Symmetric Mean Estimation with Fisher Information Rate
- Authors: Shivam Gupta, Jasper C.H. Lee, Eric Price
- Abstract summary: Mean of an unknown variance-$sigma2$ distribution can be estimated from $n$ samples with variance $fracsigma2n$ and nearly corresponding subgaussian rate.
When $f$ is known up to translation, this can be improved to $frac1nmathcal I$, where $mathcal I$ is the Fisher information of the distribution.
- Score: 15.802475232604667
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The mean of an unknown variance-$\sigma^2$ distribution $f$ can be estimated
from $n$ samples with variance $\frac{\sigma^2}{n}$ and nearly corresponding
subgaussian rate. When $f$ is known up to translation, this can be improved
asymptotically to $\frac{1}{n\mathcal I}$, where $\mathcal I$ is the Fisher
information of the distribution. Such an improvement is not possible for
general unknown $f$, but [Stone, 1975] showed that this asymptotic convergence
$\textit{is}$ possible if $f$ is $\textit{symmetric}$ about its mean. Stone's
bound is asymptotic, however: the $n$ required for convergence depends in an
unspecified way on the distribution $f$ and failure probability $\delta$. In
this paper we give finite-sample guarantees for symmetric mean estimation in
terms of Fisher information. For every $f, n, \delta$ with $n > \log
\frac{1}{\delta}$, we get convergence close to a subgaussian with variance
$\frac{1}{n \mathcal I_r}$, where $\mathcal I_r$ is the $r$-$\textit{smoothed}$
Fisher information with smoothing radius $r$ that decays polynomially in $n$.
Such a bound essentially matches the finite-sample guarantees in the known-$f$
setting.
Related papers
- Dimension-free Private Mean Estimation for Anisotropic Distributions [55.86374912608193]
Previous private estimators on distributions over $mathRd suffer from a curse of dimensionality.
We present an algorithm whose sample complexity has improved dependence on dimension.
arXiv Detail & Related papers (2024-11-01T17:59:53Z) - On the $O(\frac{\sqrt{d}}{T^{1/4}})$ Convergence Rate of RMSProp and Its Momentum Extension Measured by $\ell_1$ Norm [59.65871549878937]
This paper considers the RMSProp and its momentum extension and establishes the convergence rate of $frac1Tsum_k=1T.
Our convergence rate matches the lower bound with respect to all the coefficients except the dimension $d$.
Our convergence rate can be considered to be analogous to the $frac1Tsum_k=1T.
arXiv Detail & Related papers (2024-02-01T07:21:32Z) - Estimation and Inference in Distributional Reinforcement Learning [28.253677740976197]
We show that a dataset of size $widetilde Oleft(frac|mathcalS||mathcalA|epsilon2 (1-gamma)4right)$ suffices to ensure the Kolmogorov metric and total variation metric between $hatetapi$ and $etapi$ is below $epsilon$ with high probability.
Our findings give rise to a unified approach to statistical inference of a wide class of statistical functionals of $etapi$.
arXiv Detail & Related papers (2023-09-29T14:14:53Z) - High-dimensional Location Estimation via Norm Concentration for Subgamma
Vectors [15.802475232604667]
In location estimation, we are given $n$ samples from a known distribution $f$ shifted by an unknown translation $lambda$.
Asymptotically, the maximum likelihood estimate achieves the Cram'er-Rao bound of error $mathcal N(0, frac1nmathcal I)$.
We build on the theory using emphsmoothed estimators to bound the error for finite $n$ in terms of $mathcal I_r$, the Fisher information of the $r$-smoothed
arXiv Detail & Related papers (2023-02-05T22:17:04Z) - A spectral least-squares-type method for heavy-tailed corrupted
regression with unknown covariance \& heterogeneous noise [2.019622939313173]
We revisit heavy-tailed corrupted least-squares linear regression assuming to have a corrupted $n$-sized label-feature sample of at most $epsilon n$ arbitrary outliers.
We propose a near-optimal computationally tractable estimator, based on the power method, assuming no knowledge on $(Sigma,Xi) nor the operator norm of $Xi$.
arXiv Detail & Related papers (2022-09-06T23:37:31Z) - Finite-Sample Maximum Likelihood Estimation of Location [16.44999338864628]
For fixed $f$ the maximum-likelihood estimate (MLE) is well-known to be optimal in the limit as $n to infty$
We show for arbitrary $f$ and $n$ that one can recover a similar theory based on the Fisher information of a smoothed version of $f$, where the smoothing radius decays with $n$.
arXiv Detail & Related papers (2022-06-06T04:33:41Z) - Random matrices in service of ML footprint: ternary random features with
no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$.
We propose a novel random technique, called Ternary Random Feature (TRF)
The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z) - From Smooth Wasserstein Distance to Dual Sobolev Norm: Empirical
Approximation and Statistical Applications [18.618590805279187]
We show that $mathsfW_p(sigma)$ is controlled by a $pth order smooth dual Sobolev $mathsfd_p(sigma)$.
We derive the limit distribution of $sqrtnmathsfd_p(sigma)(hatmu_n,mu)$ in all dimensions $d$, when $mu$ is sub-Gaussian.
arXiv Detail & Related papers (2021-01-11T17:23:24Z) - Sparse sketches with small inversion bias [79.77110958547695]
Inversion bias arises when averaging estimates of quantities that depend on the inverse covariance.
We develop a framework for analyzing inversion bias, based on our proposed concept of an $(epsilon,delta)$-unbiased estimator for random matrices.
We show that when the sketching matrix $S$ is dense and has i.i.d. sub-gaussian entries, the estimator $(epsilon,delta)$-unbiased for $(Atop A)-1$ with a sketch of size $m=O(d+sqrt d/
arXiv Detail & Related papers (2020-11-21T01:33:15Z) - Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z) - Curse of Dimensionality on Randomized Smoothing for Certifiable
Robustness [151.67113334248464]
We show that extending the smoothing technique to defend against other attack models can be challenging.
We present experimental results on CIFAR to validate our theory.
arXiv Detail & Related papers (2020-02-08T22:02:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.