Optimal Mean Estimation without a Variance
- URL: http://arxiv.org/abs/2011.12433v2
- Date: Tue, 8 Dec 2020 20:31:46 GMT
- Title: Optimal Mean Estimation without a Variance
- Authors: Yeshwanth Cherapanamjeri, Nilesh Tripuraneni, Peter L. Bartlett,
Michael I. Jordan
- Abstract summary: We study the problem of heavy-tailed mean estimation in settings where the variance of the data-generating distribution does not exist.
We design an estimator which attains the smallest possible confidence interval as a function of $n,d,delta$.
- Score: 103.26777953032537
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of heavy-tailed mean estimation in settings where the
variance of the data-generating distribution does not exist. Concretely, given
a sample $\mathbf{X} = \{X_i\}_{i = 1}^n$ from a distribution $\mathcal{D}$
over $\mathbb{R}^d$ with mean $\mu$ which satisfies the following
\emph{weak-moment} assumption for some ${\alpha \in [0, 1]}$: \begin{equation*}
\forall \|v\| = 1: \mathbb{E}_{X \thicksim \mathcal{D}}[\lvert \langle X - \mu,
v\rangle \rvert^{1 + \alpha}] \leq 1, \end{equation*} and given a target
failure probability, $\delta$, our goal is to design an estimator which attains
the smallest possible confidence interval as a function of $n,d,\delta$. For
the specific case of $\alpha = 1$, foundational work of Lugosi and Mendelson
exhibits an estimator achieving subgaussian confidence intervals, and
subsequent work has led to computationally efficient versions of this
estimator. Here, we study the case of general $\alpha$, and establish the
following information-theoretic lower bound on the optimal attainable
confidence interval: \begin{equation*} \Omega \left(\sqrt{\frac{d}{n}} +
\left(\frac{d}{n}\right)^{\frac{\alpha}{(1 + \alpha)}} + \left(\frac{\log 1 /
\delta}{n}\right)^{\frac{\alpha}{(1 + \alpha)}}\right). \end{equation*}
Moreover, we devise a computationally-efficient estimator which achieves this
lower bound.
Related papers
- Efficient Continual Finite-Sum Minimization [52.5238287567572]
We propose a key twist into the finite-sum minimization, dubbed as continual finite-sum minimization.
Our approach significantly improves upon the $mathcalO(n/epsilon)$ FOs that $mathrmStochasticGradientDescent$ requires.
We also prove that there is no natural first-order method with $mathcalOleft(n/epsilonalpharight)$ complexity gradient for $alpha 1/4$, establishing that the first-order complexity of our method is nearly tight.
arXiv Detail & Related papers (2024-06-07T08:26:31Z) - Estimating the Mixing Coefficients of Geometrically Ergodic Markov
Processes [5.00389879175348]
We estimate the individual $beta$-mixing coefficients of a real-valued geometrically ergodic Markov process from a single sample-path.
Naturally no density assumptions are required in this setting; the expected error rate is shown to be of order $mathcal O(log(n) n-1/2)$.
arXiv Detail & Related papers (2024-02-11T20:17:10Z) - Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models.
In this work, we initiate the study of provably learning a multi-head attention layer from random examples.
We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z) - Estimation and Inference in Distributional Reinforcement Learning [28.253677740976197]
We show that a dataset of size $widetilde Oleft(frac|mathcalS||mathcalA|epsilon2 (1-gamma)4right)$ suffices to ensure the Kolmogorov metric and total variation metric between $hatetapi$ and $etapi$ is below $epsilon$ with high probability.
Our findings give rise to a unified approach to statistical inference of a wide class of statistical functionals of $etapi$.
arXiv Detail & Related papers (2023-09-29T14:14:53Z) - Fast $(1+\varepsilon)$-Approximation Algorithms for Binary Matrix
Factorization [54.29685789885059]
We introduce efficient $(1+varepsilon)$-approximation algorithms for the binary matrix factorization (BMF) problem.
The goal is to approximate $mathbfA$ as a product of low-rank factors.
Our techniques generalize to other common variants of the BMF problem.
arXiv Detail & Related papers (2023-06-02T18:55:27Z) - Convergence Rates of Stochastic Zeroth-order Gradient Descent for \L
ojasiewicz Functions [6.137707924685666]
We prove convergence rates of Zeroth-order Gradient Descent (SZGD) algorithms for Lojasiewicz functions.
Our results show that $ f (mathbfx_t) - f (mathbfx_infty) _t in mathbbN $ can converge faster than $ | mathbfx_infty.
arXiv Detail & Related papers (2022-10-31T00:53:17Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Spiked Covariance Estimation from Modulo-Reduced Measurements [14.569322713960494]
We develop and analyze an algorithm that, for most directions $bfu$ and $nu=mathrmpoly(k)$, estimates $bfu$ to high accuracy using $n=mathrmpoly(k)$ measurements.
Numerical experiments show that the developed algorithm performs well even in a non-asymptotic setting.
arXiv Detail & Related papers (2021-10-04T02:10:47Z) - Near-Optimal SQ Lower Bounds for Agnostically Learning Halfspaces and
ReLUs under Gaussian Marginals [49.60752558064027]
We study the fundamental problems of agnostically learning halfspaces and ReLUs under Gaussian marginals.
Our lower bounds provide strong evidence that current upper bounds for these tasks are essentially best possible.
arXiv Detail & Related papers (2020-06-29T17:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.