Optimal Mean Estimation without a Variance
- URL: http://arxiv.org/abs/2011.12433v2
- Date: Tue, 8 Dec 2020 20:31:46 GMT
- Title: Optimal Mean Estimation without a Variance
- Authors: Yeshwanth Cherapanamjeri, Nilesh Tripuraneni, Peter L. Bartlett,
Michael I. Jordan
- Abstract summary: We study the problem of heavy-tailed mean estimation in settings where the variance of the data-generating distribution does not exist.
We design an estimator which attains the smallest possible confidence interval as a function of $n,d,delta$.
- Score: 103.26777953032537
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of heavy-tailed mean estimation in settings where the
variance of the data-generating distribution does not exist. Concretely, given
a sample $\mathbf{X} = \{X_i\}_{i = 1}^n$ from a distribution $\mathcal{D}$
over $\mathbb{R}^d$ with mean $\mu$ which satisfies the following
\emph{weak-moment} assumption for some ${\alpha \in [0, 1]}$: \begin{equation*}
\forall \|v\| = 1: \mathbb{E}_{X \thicksim \mathcal{D}}[\lvert \langle X - \mu,
v\rangle \rvert^{1 + \alpha}] \leq 1, \end{equation*} and given a target
failure probability, $\delta$, our goal is to design an estimator which attains
the smallest possible confidence interval as a function of $n,d,\delta$. For
the specific case of $\alpha = 1$, foundational work of Lugosi and Mendelson
exhibits an estimator achieving subgaussian confidence intervals, and
subsequent work has led to computationally efficient versions of this
estimator. Here, we study the case of general $\alpha$, and establish the
following information-theoretic lower bound on the optimal attainable
confidence interval: \begin{equation*} \Omega \left(\sqrt{\frac{d}{n}} +
\left(\frac{d}{n}\right)^{\frac{\alpha}{(1 + \alpha)}} + \left(\frac{\log 1 /
\delta}{n}\right)^{\frac{\alpha}{(1 + \alpha)}}\right). \end{equation*}
Moreover, we devise a computationally-efficient estimator which achieves this
lower bound.
Related papers
- Sharp Gap-Dependent Variance-Aware Regret Bounds for Tabular MDPs [54.28273395444243]
We show that the Monotonic Value Omega (MVP) algorithm achieves a variance-aware gap-dependent regret bound of $$tildeOleft(left(sum_Delta_h(s,a)>0 fracH2 log K land mathttVar_maxtextc$.
arXiv Detail & Related papers (2025-06-06T20:33:57Z) - On the $O(\rac{\sqrt{d}}{K^{1/4}})$ Convergence Rate of AdamW Measured by $\ell_1$ Norm [54.28350823319057]
This paper establishes the convergence rate $frac1Ksum_k=1KEleft[|nabla f(xk)|_1right]leq O(fracsqrtdCK1/4) for AdamW measured by $ell_$ norm, where $K$ represents the iteration number, $d denotes the model dimension, and $C$ matches the constant in the optimal convergence rate of SGD.
arXiv Detail & Related papers (2025-05-17T05:02:52Z) - Algorithmic contiguity from low-degree conjecture and applications in correlated random graphs [0.0]
We provide evidence of computational hardness for two problems.
One of the main ingredient in our proof is to derive certain forms of emphalgorithm contiguity between two probability measures.
This framework provides a useful tool for performing reductions between different tasks.
arXiv Detail & Related papers (2025-02-14T00:24:51Z) - Learning a Single Neuron Robustly to Distributional Shifts and Adversarial Label Noise [38.551072383777594]
We study the problem of learning a single neuron with respect to the $L2$ loss in the presence of adversarial distribution shifts.
A new algorithm is developed to approximate the vector vector squared loss with respect to the worst distribution that is in the $chi2$divergence to the $mathcalp_0$.
arXiv Detail & Related papers (2024-11-11T03:43:52Z) - Efficient Continual Finite-Sum Minimization [52.5238287567572]
We propose a key twist into the finite-sum minimization, dubbed as continual finite-sum minimization.
Our approach significantly improves upon the $mathcalO(n/epsilon)$ FOs that $mathrmStochasticGradientDescent$ requires.
We also prove that there is no natural first-order method with $mathcalOleft(n/epsilonalpharight)$ complexity gradient for $alpha 1/4$, establishing that the first-order complexity of our method is nearly tight.
arXiv Detail & Related papers (2024-06-07T08:26:31Z) - Estimating the Mixing Coefficients of Geometrically Ergodic Markov
Processes [5.00389879175348]
We estimate the individual $beta$-mixing coefficients of a real-valued geometrically ergodic Markov process from a single sample-path.
Naturally no density assumptions are required in this setting; the expected error rate is shown to be of order $mathcal O(log(n) n-1/2)$.
arXiv Detail & Related papers (2024-02-11T20:17:10Z) - Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models.
In this work, we initiate the study of provably learning a multi-head attention layer from random examples.
We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z) - Estimation and Inference in Distributional Reinforcement Learning [28.253677740976197]
We show that a dataset of size $widetilde Oleft(frac|mathcalS||mathcalA|epsilon2 (1-gamma)4right)$ suffices to ensure the Kolmogorov metric and total variation metric between $hatetapi$ and $etapi$ is below $epsilon$ with high probability.
Our findings give rise to a unified approach to statistical inference of a wide class of statistical functionals of $etapi$.
arXiv Detail & Related papers (2023-09-29T14:14:53Z) - Fast $(1+\varepsilon)$-Approximation Algorithms for Binary Matrix
Factorization [54.29685789885059]
We introduce efficient $(1+varepsilon)$-approximation algorithms for the binary matrix factorization (BMF) problem.
The goal is to approximate $mathbfA$ as a product of low-rank factors.
Our techniques generalize to other common variants of the BMF problem.
arXiv Detail & Related papers (2023-06-02T18:55:27Z) - Efficient Sampling of Stochastic Differential Equations with Positive
Semi-Definite Models [91.22420505636006]
This paper deals with the problem of efficient sampling from a differential equation, given the drift function and the diffusion matrix.
It is possible to obtain independent and identically distributed (i.i.d.) samples at precision $varepsilon$ with a cost that is $m2 d log (1/varepsilon)$
Our results suggest that as the true solution gets smoother, we can circumvent the curse of dimensionality without requiring any sort of convexity.
arXiv Detail & Related papers (2023-03-30T02:50:49Z) - Convergence Rates of Stochastic Zeroth-order Gradient Descent for \L
ojasiewicz Functions [6.137707924685666]
We prove convergence rates of Zeroth-order Gradient Descent (SZGD) algorithms for Lojasiewicz functions.
Our results show that $ f (mathbfx_t) - f (mathbfx_infty) _t in mathbbN $ can converge faster than $ | mathbfx_infty.
arXiv Detail & Related papers (2022-10-31T00:53:17Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Mean Estimation in High-Dimensional Binary Markov Gaussian Mixture
Models [12.746888269949407]
We consider a high-dimensional mean estimation problem over a binary hidden Markov model.
We establish a nearly minimax optimal (up to logarithmic factors) estimation error rate, as a function of $|theta_*|,delta,d,n$.
arXiv Detail & Related papers (2022-06-06T09:34:04Z) - Faster Rates of Convergence to Stationary Points in Differentially
Private Optimization [31.46358820048179]
We study the problem of approximating stationary points of Lipschitz and smooth functions under $(varepsilon,delta)$-differential privacy (DP)
A point $widehatw$ is called an $alpha$-stationary point of a function $mathbbRdrightarrowmathbbR$ if $|nabla F(widehatw)|leq alpha$.
We provide a new efficient algorithm that finds an $tildeObig(big[
arXiv Detail & Related papers (2022-06-02T02:43:44Z) - Spiked Covariance Estimation from Modulo-Reduced Measurements [14.569322713960494]
We develop and analyze an algorithm that, for most directions $bfu$ and $nu=mathrmpoly(k)$, estimates $bfu$ to high accuracy using $n=mathrmpoly(k)$ measurements.
Numerical experiments show that the developed algorithm performs well even in a non-asymptotic setting.
arXiv Detail & Related papers (2021-10-04T02:10:47Z) - Sparse sketches with small inversion bias [79.77110958547695]
Inversion bias arises when averaging estimates of quantities that depend on the inverse covariance.
We develop a framework for analyzing inversion bias, based on our proposed concept of an $(epsilon,delta)$-unbiased estimator for random matrices.
We show that when the sketching matrix $S$ is dense and has i.i.d. sub-gaussian entries, the estimator $(epsilon,delta)$-unbiased for $(Atop A)-1$ with a sketch of size $m=O(d+sqrt d/
arXiv Detail & Related papers (2020-11-21T01:33:15Z) - Near-Optimal SQ Lower Bounds for Agnostically Learning Halfspaces and
ReLUs under Gaussian Marginals [49.60752558064027]
We study the fundamental problems of agnostically learning halfspaces and ReLUs under Gaussian marginals.
Our lower bounds provide strong evidence that current upper bounds for these tasks are essentially best possible.
arXiv Detail & Related papers (2020-06-29T17:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.