The Sketched Wasserstein Distance for mixture distributions
- URL: http://arxiv.org/abs/2206.12768v1
- Date: Sun, 26 Jun 2022 02:33:40 GMT
- Title: The Sketched Wasserstein Distance for mixture distributions
- Authors: Xin Bing and Florentina Bunea and Jonathan Niles-Weed
- Abstract summary: Sketched Wasserstein Distance ($WS$) is a new probability distance specifically tailored to finite mixture distributions.
We show that $WS$ is defined to be the most discriminative of this metric to the space $mathcalS = textrmconv(mathcalA)$ of mixtures of elements of $mathcalA$.
- Score: 13.643197515573029
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Sketched Wasserstein Distance ($W^S$) is a new probability distance
specifically tailored to finite mixture distributions. Given any metric $d$
defined on a set $\mathcal{A}$ of probability distributions, $W^S$ is defined
to be the most discriminative convex extension of this metric to the space
$\mathcal{S} = \textrm{conv}(\mathcal{A})$ of mixtures of elements of
$\mathcal{A}$. Our representation theorem shows that the space $(\mathcal{S},
W^S)$ constructed in this way is isomorphic to a Wasserstein space over
$\mathcal{X} = (\mathcal{A}, d)$. This result establishes a universality
property for the Wasserstein distances, revealing them to be uniquely
characterized by their discriminative power for finite mixtures. We exploit
this representation theorem to propose an estimation methodology based on
Kantorovich--Rubenstein duality, and prove a general theorem that shows that
its estimation error can be bounded by the sum of the errors of estimating the
mixture weights and the mixture components, for any estimators of these
quantities. We derive sharp statistical properties for the estimated $W^S$ in
the case of $p$-dimensional discrete $K$-mixtures, which we show can be
estimated at a rate proportional to $\sqrt{K/N}$, up to logarithmic factors. We
complement these bounds with a minimax lower bound on the risk of estimating
the Wasserstein distance between distributions on a $K$-point metric space,
which matches our upper bound up to logarithmic factors. This result is the
first nearly tight minimax lower bound for estimating the Wasserstein distance
between discrete distributions. Furthermore, we construct $\sqrt{N}$
asymptotically normal estimators of the mixture weights, and derive a
$\sqrt{N}$ distributional limit of our estimator of $W^S$ as a consequence.
Simulation studies and a data analysis provide strong support on the
applicability of the new Sketched Wasserstein Distance.
Related papers
- A Relaxed Wasserstein Distance Formulation for Mixtures of Radially Contoured Distributions [5.876704494595038]
We propose a simple relaxed Wasserstein distance for identifiable mixtures of radially contoured distributions.
We show some properties of this distance and that its definition does not require marginal consistency.
arXiv Detail & Related papers (2025-03-18T04:39:31Z) - Summarizing Bayesian Nonparametric Mixture Posterior -- Sliced Optimal Transport Metrics for Gaussian Mixtures [10.694077392690447]
Existing methods to summarize posterior inference for mixture models focus on identifying a point estimate of the implied random partition for clustering.
We propose a novel approach for summarizing posterior inference in nonparametric Bayesian mixture models, prioritizing density estimation of the mixing measure (or mixture) as an inference target.
arXiv Detail & Related papers (2024-11-22T02:15:38Z) - Sobolev Space Regularised Pre Density Models [51.558848491038916]
We propose a new approach to non-parametric density estimation that is based on regularizing a Sobolev norm of the density.
This method is statistically consistent, and makes the inductive validation model clear and consistent.
arXiv Detail & Related papers (2023-07-25T18:47:53Z) - Centered plug-in estimation of Wasserstein distances [0.0]
The plug-in estimator of the squared Euclidean 2-Wasserstein distance is conservative, however due to its large positive bias it is often uninformative.
We construct a pair of centered plug-in estimators that decrease with the true Wasserstein distance, and are therefore guaranteed to be informative, for any finite sample size.
arXiv Detail & Related papers (2022-03-22T11:26:18Z) - Density Ratio Estimation via Infinitesimal Classification [85.08255198145304]
We propose DRE-infty, a divide-and-conquer approach to reduce Density ratio estimation (DRE) to a series of easier subproblems.
Inspired by Monte Carlo methods, we smoothly interpolate between the two distributions via an infinite continuum of intermediate bridge distributions.
We show that our approach performs well on downstream tasks such as mutual information estimation and energy-based modeling on complex, high-dimensional datasets.
arXiv Detail & Related papers (2021-11-22T06:26:29Z) - Tangent Space and Dimension Estimation with the Wasserstein Distance [10.118241139691952]
Consider a set of points sampled independently near a smooth compact submanifold of Euclidean space.
We provide mathematically rigorous bounds on the number of sample points required to estimate both the dimension and the tangent spaces of that manifold.
arXiv Detail & Related papers (2021-10-12T21:02:06Z) - Depth-based pseudo-metrics between probability distributions [1.1470070927586016]
We propose two new pseudo-metrics between continuous probability measures based on data depth and its associated central regions.
In contrast to the Wasserstein distance, the proposed pseudo-metrics do not suffer from the curse of dimensionality.
The regions-based pseudo-metric appears to be robust w.r.t. both outliers and heavy tails.
arXiv Detail & Related papers (2021-03-23T17:33:18Z) - Continuous Wasserstein-2 Barycenter Estimation without Minimax
Optimization [94.18714844247766]
Wasserstein barycenters provide a geometric notion of the weighted average of probability measures based on optimal transport.
We present a scalable algorithm to compute Wasserstein-2 barycenters given sample access to the input measures.
arXiv Detail & Related papers (2021-02-02T21:01:13Z) - Two-sample Test using Projected Wasserstein Distance [18.46110328123008]
We develop a projected Wasserstein distance for the two-sample test, a fundamental problem in statistics and machine learning.
A key contribution is to couple optimal projection to find the low dimensional linear mapping to maximize the Wasserstein distance between projected probability distributions.
arXiv Detail & Related papers (2020-10-22T18:08:58Z) - On Projection Robust Optimal Transport: Sample Complexity and Model
Misspecification [101.0377583883137]
Projection robust (PR) OT seeks to maximize the OT cost between two measures by choosing a $k$-dimensional subspace onto which they can be projected.
Our first contribution is to establish several fundamental statistical properties of PR Wasserstein distances.
Next, we propose the integral PR Wasserstein (IPRW) distance as an alternative to the PRW distance, by averaging rather than optimizing on subspaces.
arXiv Detail & Related papers (2020-06-22T14:35:33Z) - When OT meets MoM: Robust estimation of Wasserstein Distance [8.812837829361923]
We consider the problem of estimating the Wasserstein distance between two probability distributions when observations are polluted by outliers.
We introduce and discuss novel MoM-based robust estimators whose consistency is studied under a data contamination model.
We propose a simple MoM-based re-weighting scheme that could be used in conjunction with the Sinkhorn algorithm.
arXiv Detail & Related papers (2020-06-18T07:31:39Z) - Minimax Optimal Estimation of KL Divergence for Continuous Distributions [56.29748742084386]
Esting Kullback-Leibler divergence from identical and independently distributed samples is an important problem in various domains.
One simple and effective estimator is based on the k nearest neighbor between these samples.
arXiv Detail & Related papers (2020-02-26T16:37:37Z) - Distributed, partially collapsed MCMC for Bayesian Nonparametrics [68.5279360794418]
We exploit the fact that completely random measures, which commonly used models like the Dirichlet process and the beta-Bernoulli process can be expressed as, are decomposable into independent sub-measures.
We use this decomposition to partition the latent measure into a finite measure containing only instantiated components, and an infinite measure containing all other components.
The resulting hybrid algorithm can be applied to allow scalable inference without sacrificing convergence guarantees.
arXiv Detail & Related papers (2020-01-15T23:10:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.