The Sketched Wasserstein Distance for mixture distributions
- URL: http://arxiv.org/abs/2206.12768v1
- Date: Sun, 26 Jun 2022 02:33:40 GMT
- Title: The Sketched Wasserstein Distance for mixture distributions
- Authors: Xin Bing and Florentina Bunea and Jonathan Niles-Weed
- Abstract summary: Sketched Wasserstein Distance ($WS$) is a new probability distance specifically tailored to finite mixture distributions.
We show that $WS$ is defined to be the most discriminative of this metric to the space $mathcalS = textrmconv(mathcalA)$ of mixtures of elements of $mathcalA$.
- Score: 13.643197515573029
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Sketched Wasserstein Distance ($W^S$) is a new probability distance
specifically tailored to finite mixture distributions. Given any metric $d$
defined on a set $\mathcal{A}$ of probability distributions, $W^S$ is defined
to be the most discriminative convex extension of this metric to the space
$\mathcal{S} = \textrm{conv}(\mathcal{A})$ of mixtures of elements of
$\mathcal{A}$. Our representation theorem shows that the space $(\mathcal{S},
W^S)$ constructed in this way is isomorphic to a Wasserstein space over
$\mathcal{X} = (\mathcal{A}, d)$. This result establishes a universality
property for the Wasserstein distances, revealing them to be uniquely
characterized by their discriminative power for finite mixtures. We exploit
this representation theorem to propose an estimation methodology based on
Kantorovich--Rubenstein duality, and prove a general theorem that shows that
its estimation error can be bounded by the sum of the errors of estimating the
mixture weights and the mixture components, for any estimators of these
quantities. We derive sharp statistical properties for the estimated $W^S$ in
the case of $p$-dimensional discrete $K$-mixtures, which we show can be
estimated at a rate proportional to $\sqrt{K/N}$, up to logarithmic factors. We
complement these bounds with a minimax lower bound on the risk of estimating
the Wasserstein distance between distributions on a $K$-point metric space,
which matches our upper bound up to logarithmic factors. This result is the
first nearly tight minimax lower bound for estimating the Wasserstein distance
between discrete distributions. Furthermore, we construct $\sqrt{N}$
asymptotically normal estimators of the mixture weights, and derive a
$\sqrt{N}$ distributional limit of our estimator of $W^S$ as a consequence.
Simulation studies and a data analysis provide strong support on the
applicability of the new Sketched Wasserstein Distance.
Related papers
- Sharp bounds for max-sliced Wasserstein distances [0.0]
We match upper and lower bounds for the expected max-sliced 1-Wasserstein distance between a probability measure on a separable Hilbert space and its empirical distribution from $n$ samples.
We also obtain an upper bound, that is sharp up to a log factor, for the expected max-sliced 2-Wasserstein distance between a symmetric probability measure $mu$ on a Euclidean space.
arXiv Detail & Related papers (2024-03-01T16:55:10Z) - Estimating the Mixing Coefficients of Geometrically Ergodic Markov
Processes [5.00389879175348]
We estimate the individual $beta$-mixing coefficients of a real-valued geometrically ergodic Markov process from a single sample-path.
Naturally no density assumptions are required in this setting; the expected error rate is shown to be of order $mathcal O(log(n) n-1/2)$.
arXiv Detail & Related papers (2024-02-11T20:17:10Z) - Properties of Discrete Sliced Wasserstein Losses [11.280151521887076]
The Sliced Wasserstein (SW) distance has become a popular alternative to the Wasserstein distance for comparing probability measures.
Widespread applications include image processing, domain adaptation and generative modelling, where it is common to optimise some parameters in order to minimise SW.
We investigate the regularity and optimisation properties of this energy, as well as its Monte-Carlo approximation $mathcalE_p$.
arXiv Detail & Related papers (2023-07-19T21:21:18Z) - Effective Minkowski Dimension of Deep Nonparametric Regression: Function
Approximation and Statistical Theories [70.90012822736988]
Existing theories on deep nonparametric regression have shown that when the input data lie on a low-dimensional manifold, deep neural networks can adapt to intrinsic data structures.
This paper introduces a relaxed assumption that input data are concentrated around a subset of $mathbbRd$ denoted by $mathcalS$, and the intrinsic dimension $mathcalS$ can be characterized by a new complexity notation -- effective Minkowski dimension.
arXiv Detail & Related papers (2023-06-26T17:13:31Z) - Statistical Learning under Heterogeneous Distribution Shift [71.8393170225794]
Ground-truth predictor is additive $mathbbE[mathbfz mid mathbfx,mathbfy] = f_star(mathbfx) +g_star(mathbfy)$.
arXiv Detail & Related papers (2023-02-27T16:34:21Z) - Robust Estimation under the Wasserstein Distance [28.792608997509376]
We introduce a new outlier-robust Wasserstein distance $mathsfW_pvarepsilon$ which allows for $varepsilon$ outlier mass to be removed from its input distributions.
We show that minimum distance estimation under $mathsfW_pvarepsilon$ achieves minimax optimal robust estimation risk.
arXiv Detail & Related papers (2023-02-02T17:20:25Z) - Non-Parametric Estimation of Manifolds from Noisy Data [1.0152838128195467]
We consider the problem of estimating a $d$ dimensional sub-manifold of $mathbbRD$ from a finite set of noisy samples.
We show that the estimation yields rates of convergence of $n-frack2k + d$ for the point estimation and $n-frack-12k + d$ for the estimation of tangent space.
arXiv Detail & Related papers (2021-05-11T02:29:33Z) - Optimal Mean Estimation without a Variance [103.26777953032537]
We study the problem of heavy-tailed mean estimation in settings where the variance of the data-generating distribution does not exist.
We design an estimator which attains the smallest possible confidence interval as a function of $n,d,delta$.
arXiv Detail & Related papers (2020-11-24T22:39:21Z) - Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and
Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP)
We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z) - Robustly Learning any Clusterable Mixture of Gaussians [55.41573600814391]
We study the efficient learnability of high-dimensional Gaussian mixtures in the adversarial-robust setting.
We provide an algorithm that learns the components of an $epsilon$-corrupted $k$-mixture within information theoretically near-optimal error proofs of $tildeO(epsilon)$.
Our main technical contribution is a new robust identifiability proof clusters from a Gaussian mixture, which can be captured by the constant-degree Sum of Squares proof system.
arXiv Detail & Related papers (2020-05-13T16:44:12Z) - A diffusion approach to Stein's method on Riemannian manifolds [65.36007959755302]
We exploit the relationship between the generator of a diffusion on $mathbf M$ with target invariant measure and its characterising Stein operator.
We derive Stein factors, which bound the solution to the Stein equation and its derivatives.
We imply that the bounds for $mathbb Rm$ remain valid when $mathbf M$ is a flat manifold.
arXiv Detail & Related papers (2020-03-25T17:03:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.