Optimality in Mean Estimation: Beyond Worst-Case, Beyond Sub-Gaussian,
and Beyond $1+\alpha$ Moments
- URL: http://arxiv.org/abs/2311.12784v1
- Date: Tue, 21 Nov 2023 18:50:38 GMT
- Title: Optimality in Mean Estimation: Beyond Worst-Case, Beyond Sub-Gaussian,
and Beyond $1+\alpha$ Moments
- Authors: Trung Dang, Jasper C.H. Lee, Maoyuan Song, Paul Valiant
- Abstract summary: We introduce a new definitional framework to analyze the fine-grained optimality of algorithms.
We show that median-of-means is neighborhood optimal, up to constant factors.
It is open to find a neighborhood-separated estimator without constant factor slackness.
- Score: 10.889739958035536
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: There is growing interest in improving our algorithmic understanding of
fundamental statistical problems such as mean estimation, driven by the goal of
understanding the limits of what we can extract from valuable data. The state
of the art results for mean estimation in $\mathbb{R}$ are 1) the optimal
sub-Gaussian mean estimator by [LV22], with the tight sub-Gaussian constant for
all distributions with finite but unknown variance, and 2) the analysis of the
median-of-means algorithm by [BCL13] and a lower bound by [DLLO16],
characterizing the big-O optimal errors for distributions for which only a
$1+\alpha$ moment exists for $\alpha \in (0,1)$. Both results, however, are
optimal only in the worst case. We initiate the fine-grained study of the mean
estimation problem: Can algorithms leverage useful features of the input
distribution to beat the sub-Gaussian rate, without explicit knowledge of such
features?
We resolve this question with an unexpectedly nuanced answer: "Yes in limited
regimes, but in general no". For any distribution $p$ with a finite mean, we
construct a distribution $q$ whose mean is well-separated from $p$'s, yet $p$
and $q$ are not distinguishable with high probability, and $q$ further
preserves $p$'s moments up to constants. The main consequence is that no
reasonable estimator can asymptotically achieve better than the sub-Gaussian
error rate for any distribution, matching the worst-case result of [LV22]. More
generally, we introduce a new definitional framework to analyze the
fine-grained optimality of algorithms, which we call "neighborhood optimality",
interpolating between the unattainably strong "instance optimality" and the
trivially weak "admissibility" definitions. Applying the new framework, we show
that median-of-means is neighborhood optimal, up to constant factors. It is
open to find a neighborhood-optimal estimator without constant factor
slackness.
Related papers
- Efficient Multivariate Robust Mean Estimation Under Mean-Shift Contamination [35.67742880001828]
We give the first computationally efficient algorithm for high-dimensional robust mean estimation with mean-shift contamination.
Our algorithm has near-optimal sample complexity, runs in sample-polynomial time, and approximates the target mean to any desired accuracy.
arXiv Detail & Related papers (2025-02-20T17:53:13Z) - Entangled Mean Estimation in High-Dimensions [36.97113089188035]
We study the task of high-dimensional entangled mean estimation in the subset-of-signals model.
We show that the optimal error (up to polylogarithmic factors) is $f(alpha,N) + sqrtD/(alpha N)$, where the term $f(alpha,N)$ is the error of the one-dimensional problem and the second term is the sub-Gaussian error rate.
arXiv Detail & Related papers (2025-01-09T18:31:35Z) - Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic
Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel.
An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost.
Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z) - $L^1$ Estimation: On the Optimality of Linear Estimators [64.76492306585168]
This work shows that the only prior distribution on $X$ that induces linearity in the conditional median is Gaussian.
In particular, it is demonstrated that if the conditional distribution $P_X|Y=y$ is symmetric for all $y$, then $X$ must follow a Gaussian distribution.
arXiv Detail & Related papers (2023-09-17T01:45:13Z) - Achieving the Asymptotically Optimal Sample Complexity of Offline Reinforcement Learning: A DRO-Based Approach [36.88301225561535]
offline reinforcement learning aims to learn from pre-collected datasets without active exploration.
Existing approaches adopt a pessimistic stance towards uncertainty by penalizing rewards of under-explored state-action pairs to estimate value functions conservatively.
We show that the distributionally robust optimization (DRO) based approach can also address these challenges and is asymptotically minimax optimal
arXiv Detail & Related papers (2023-05-22T17:50:18Z) - Estimating Optimal Policy Value in General Linear Contextual Bandits [50.008542459050155]
In many bandit problems, the maximal reward achievable by a policy is often unknown in advance.
We consider the problem of estimating the optimal policy value in the sublinear data regime before the optimal policy is even learnable.
We present a more practical, computationally efficient algorithm that estimates a problem-dependent upper bound on $V*$.
arXiv Detail & Related papers (2023-02-19T01:09:24Z) - General Gaussian Noise Mechanisms and Their Optimality for Unbiased Mean
Estimation [58.03500081540042]
A classical approach to private mean estimation is to compute the true mean and add unbiased, but possibly correlated, Gaussian noise to it.
We show that for every input dataset, an unbiased mean estimator satisfying concentrated differential privacy introduces approximately at least as much error.
arXiv Detail & Related papers (2023-01-31T18:47:42Z) - Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model.
The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - Robust estimation via generalized quasi-gradients [28.292300073453877]
We show why many recently proposed robust estimation problems are efficiently solvable.
We identify the existence of "generalized quasi-gradients"
We show that generalized quasi-gradients exist and construct efficient algorithms.
arXiv Detail & Related papers (2020-05-28T15:14:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.