Memory Efficient And Minimax Distribution Estimation Under Wasserstein
Distance Using Bayesian Histograms
- URL: http://arxiv.org/abs/2307.10099v1
- Date: Wed, 19 Jul 2023 16:13:20 GMT
- Title: Memory Efficient And Minimax Distribution Estimation Under Wasserstein
Distance Using Bayesian Histograms
- Authors: Peter Matthew Jacobs, Lekha Patel, Anirban Bhattacharya, Debdeep Pati
- Abstract summary: We show that when $d 2v$, histograms possess a special textitmemory efficiency property, in reference to the sample size $n, order $nd/2v$ bins are needed to obtain minimax rate optimality.
The attained memory footprint overcomes existing minimax optimal procedures by a factor in $n$; for example an $n1 - d/2v$ factor reduction in the footprint when compared to the empirical measure, a minimax estimator in the Borel probability measure class.
- Score: 6.21295508577576
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study Bayesian histograms for distribution estimation on $[0,1]^d$ under
the Wasserstein $W_v, 1 \leq v < \infty$ distance in the i.i.d sampling regime.
We newly show that when $d < 2v$, histograms possess a special \textit{memory
efficiency} property, whereby in reference to the sample size $n$, order
$n^{d/2v}$ bins are needed to obtain minimax rate optimality. This result holds
for the posterior mean histogram and with respect to posterior contraction:
under the class of Borel probability measures and some classes of smooth
densities. The attained memory footprint overcomes existing minimax optimal
procedures by a polynomial factor in $n$; for example an $n^{1 - d/2v}$ factor
reduction in the footprint when compared to the empirical measure, a minimax
estimator in the Borel probability measure class. Additionally constructing
both the posterior mean histogram and the posterior itself can be done
super--linearly in $n$. Due to the popularity of the $W_1,W_2$ metrics and the
coverage provided by the $d < 2v$ case, our results are of most practical
interest in the $(d=1,v =1,2), (d=2,v=2), (d=3,v=2)$ settings and we provide
simulations demonstrating the theory in several of these instances.
Related papers
- Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms [50.15964512954274]
We study the problem of residual error estimation for matrix and vector norms using a linear sketch.
We demonstrate that this gives a substantial advantage empirically, for roughly the same sketch size and accuracy as in previous work.
We also show an $Omega(k2/pn1-2/p)$ lower bound for the sparse recovery problem, which is tight up to a $mathrmpoly(log n)$ factor.
arXiv Detail & Related papers (2024-08-16T02:33:07Z) - Multiple-policy Evaluation via Density Estimation [30.914344538340412]
We propose an algorithm named $mathrmCAESAR$ for this problem.
Up to low order and logarithmic terms $mathrmCAESAR$ achieves a sample complexity $tildeOleft(fracH4epsilon2sum_h=1Hmax_kin[K]sum_s,afrac(d_hpik(s,a))2mu*_h(s,a)right)$, where $dpi
arXiv Detail & Related papers (2024-03-29T23:55:25Z) - Minimax Optimality of Score-based Diffusion Models: Beyond the Density Lower Bound Assumptions [11.222970035173372]
kernel-based score estimator achieves an optimal mean square error of $widetildeOleft(n-1 t-fracd+22(tfracd2 vee 1)right)
We show that a kernel-based score estimator achieves an optimal mean square error of $widetildeOleft(n-1/2 t-fracd4right)$ upper bound for the total variation error of the distribution of the sample generated by the diffusion model under a mere sub-Gaussian
arXiv Detail & Related papers (2024-02-23T20:51:31Z) - $L^1$ Estimation: On the Optimality of Linear Estimators [64.76492306585168]
This work shows that the only prior distribution on $X$ that induces linearity in the conditional median is Gaussian.
In particular, it is demonstrated that if the conditional distribution $P_X|Y=y$ is symmetric for all $y$, then $X$ must follow a Gaussian distribution.
arXiv Detail & Related papers (2023-09-17T01:45:13Z) - Data Structures for Density Estimation [66.36971978162461]
Given a sublinear (in $n$) number of samples from $p$, our main result is the first data structure that identifies $v_i$ in time sublinear in $k$.
We also give an improved version of the algorithm of Acharya et al. that reports $v_i$ in time linear in $k$.
arXiv Detail & Related papers (2023-06-20T06:13:56Z) - Robust Sparse Mean Estimation via Incremental Learning [15.536082641659423]
In this paper, we study the problem of robust mean estimation, where the goal is to estimate a $k$-sparse mean from a collection of partially corrupted samples.
We present a simple mean estimator that overcomes both challenges under moderate conditions.
Our method does not need any prior knowledge of the sparsity level $k$.
arXiv Detail & Related papers (2023-05-24T16:02:28Z) - Estimating the minimizer and the minimum value of a regression function
under passive design [72.85024381807466]
We propose a new method for estimating the minimizer $boldsymbolx*$ and the minimum value $f*$ of a smooth and strongly convex regression function $f$.
We derive non-asymptotic upper bounds for the quadratic risk and optimization error of $boldsymbolz_n$, and for the risk of estimating $f*$.
arXiv Detail & Related papers (2022-11-29T18:38:40Z) - TURF: A Two-factor, Universal, Robust, Fast Distribution Learning
Algorithm [64.13217062232874]
One of its most powerful and successful modalities approximates every distribution to an $ell$ distance essentially at most a constant times larger than its closest $t$-piece degree-$d_$.
We provide a method that estimates this number near-optimally, hence helps approach the best possible approximation.
arXiv Detail & Related papers (2022-02-15T03:49:28Z) - Likelihood estimation of sparse topic distributions in topic models and
its applications to Wasserstein document distance calculations [3.679981089267181]
In topic models, the $ptimes n$ expected word frequency matrix is assumed to be factorized as a $ptimes K$ word-topic matrix $A$.
columns of $A$ are viewed as $p$-dimensional mixture components that are common to all documents.
When $A$ is unknown, we estimate $T$ by optimizing the likelihood function corresponding to a plug in, generic, estimator $hatA$ of $A$.
arXiv Detail & Related papers (2021-07-12T22:22:32Z) - Sample-Efficient Reinforcement Learning for Linearly-Parameterized MDPs
with a Generative Model [3.749193647980305]
This paper considers a Markov decision process (MDP) that admits a set of state-action features.
We show that a model-based approach (resp.$$Q-learning) provably learns an $varepsilon$-optimal policy with high probability.
arXiv Detail & Related papers (2021-05-28T17:49:39Z) - Sparse sketches with small inversion bias [79.77110958547695]
Inversion bias arises when averaging estimates of quantities that depend on the inverse covariance.
We develop a framework for analyzing inversion bias, based on our proposed concept of an $(epsilon,delta)$-unbiased estimator for random matrices.
We show that when the sketching matrix $S$ is dense and has i.i.d. sub-gaussian entries, the estimator $(epsilon,delta)$-unbiased for $(Atop A)-1$ with a sketch of size $m=O(d+sqrt d/
arXiv Detail & Related papers (2020-11-21T01:33:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.