Optimality in Mean Estimation: Beyond Worst-Case, Beyond Sub-Gaussian,
and Beyond $1+\alpha$ Moments
- URL: http://arxiv.org/abs/2311.12784v1
- Date: Tue, 21 Nov 2023 18:50:38 GMT
- Title: Optimality in Mean Estimation: Beyond Worst-Case, Beyond Sub-Gaussian,
and Beyond $1+\alpha$ Moments
- Authors: Trung Dang, Jasper C.H. Lee, Maoyuan Song, Paul Valiant
- Abstract summary: We introduce a new definitional framework to analyze the fine-grained optimality of algorithms.
We show that median-of-means is neighborhood optimal, up to constant factors.
It is open to find a neighborhood-separated estimator without constant factor slackness.
- Score: 10.889739958035536
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: There is growing interest in improving our algorithmic understanding of
fundamental statistical problems such as mean estimation, driven by the goal of
understanding the limits of what we can extract from valuable data. The state
of the art results for mean estimation in $\mathbb{R}$ are 1) the optimal
sub-Gaussian mean estimator by [LV22], with the tight sub-Gaussian constant for
all distributions with finite but unknown variance, and 2) the analysis of the
median-of-means algorithm by [BCL13] and a lower bound by [DLLO16],
characterizing the big-O optimal errors for distributions for which only a
$1+\alpha$ moment exists for $\alpha \in (0,1)$. Both results, however, are
optimal only in the worst case. We initiate the fine-grained study of the mean
estimation problem: Can algorithms leverage useful features of the input
distribution to beat the sub-Gaussian rate, without explicit knowledge of such
features?
We resolve this question with an unexpectedly nuanced answer: "Yes in limited
regimes, but in general no". For any distribution $p$ with a finite mean, we
construct a distribution $q$ whose mean is well-separated from $p$'s, yet $p$
and $q$ are not distinguishable with high probability, and $q$ further
preserves $p$'s moments up to constants. The main consequence is that no
reasonable estimator can asymptotically achieve better than the sub-Gaussian
error rate for any distribution, matching the worst-case result of [LV22]. More
generally, we introduce a new definitional framework to analyze the
fine-grained optimality of algorithms, which we call "neighborhood optimality",
interpolating between the unattainably strong "instance optimality" and the
trivially weak "admissibility" definitions. Applying the new framework, we show
that median-of-means is neighborhood optimal, up to constant factors. It is
open to find a neighborhood-optimal estimator without constant factor
slackness.
Related papers
- Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic
Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel.
An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost.
Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z) - Optimal score estimation via empirical Bayes smoothing [13.685846094715364]
We study the problem of estimating the score function of an unknown probability distribution $rho*$ from $n$ independent and identically distributed observations in $d$ dimensions.
We show that a regularized score estimator based on a Gaussian kernel attains this rate, shown optimal by a matching minimax lower bound.
arXiv Detail & Related papers (2024-02-12T16:17:40Z) - $L^1$ Estimation: On the Optimality of Linear Estimators [64.76492306585168]
This work shows that the only prior distribution on $X$ that induces linearity in the conditional median is Gaussian.
In particular, it is demonstrated that if the conditional distribution $P_X|Y=y$ is symmetric for all $y$, then $X$ must follow a Gaussian distribution.
arXiv Detail & Related papers (2023-09-17T01:45:13Z) - Achieving the Asymptotically Optimal Sample Complexity of Offline Reinforcement Learning: A DRO-Based Approach [36.88301225561535]
offline reinforcement learning aims to learn from pre-collected datasets without active exploration.
Existing approaches adopt a pessimistic stance towards uncertainty by penalizing rewards of under-explored state-action pairs to estimate value functions conservatively.
We show that the distributionally robust optimization (DRO) based approach can also address these challenges and is asymptotically minimax optimal
arXiv Detail & Related papers (2023-05-22T17:50:18Z) - Estimating Optimal Policy Value in General Linear Contextual Bandits [50.008542459050155]
In many bandit problems, the maximal reward achievable by a policy is often unknown in advance.
We consider the problem of estimating the optimal policy value in the sublinear data regime before the optimal policy is even learnable.
We present a more practical, computationally efficient algorithm that estimates a problem-dependent upper bound on $V*$.
arXiv Detail & Related papers (2023-02-19T01:09:24Z) - General Gaussian Noise Mechanisms and Their Optimality for Unbiased Mean
Estimation [58.03500081540042]
A classical approach to private mean estimation is to compute the true mean and add unbiased, but possibly correlated, Gaussian noise to it.
We show that for every input dataset, an unbiased mean estimator satisfying concentrated differential privacy introduces approximately at least as much error.
arXiv Detail & Related papers (2023-01-31T18:47:42Z) - Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model.
The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z) - Robust Sparse Mean Estimation via Sum of Squares [42.526664955704746]
We study the problem of high-dimensional sparse mean estimation in the presence of an $epsilon$-fraction of adversarial outliers.
Our algorithms follow the Sum-of-Squares based, to algorithms approach.
arXiv Detail & Related papers (2022-06-07T16:49:54Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - Robust estimation via generalized quasi-gradients [28.292300073453877]
We show why many recently proposed robust estimation problems are efficiently solvable.
We identify the existence of "generalized quasi-gradients"
We show that generalized quasi-gradients exist and construct efficient algorithms.
arXiv Detail & Related papers (2020-05-28T15:14:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.