Related papers: The Sketched Wasserstein Distance for mixture distributions

The Sketched Wasserstein Distance for mixture distributions

URL: http://arxiv.org/abs/2206.12768v1
Date: Sun, 26 Jun 2022 02:33:40 GMT
Title: The Sketched Wasserstein Distance for mixture distributions
Authors: Xin Bing and Florentina Bunea and Jonathan Niles-Weed
Abstract summary: Sketched Wasserstein Distance ($WS$) is a new probability distance specifically tailored to finite mixture distributions. We show that $WS$ is defined to be the most discriminative of this metric to the space $mathcalS = textrmconv(mathcalA)$ of mixtures of elements of $mathcalA$.
Score: 13.643197515573029
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The Sketched Wasserstein Distance ($W^S$) is a new probability distance specifically tailored to finite mixture distributions. Given any metric $d$ defined on a set $\mathcal{A}$ of probability distributions, $W^S$ is defined to be the most discriminative convex extension of this metric to the space $\mathcal{S} = \textrm{conv}(\mathcal{A})$ of mixtures of elements of $\mathcal{A}$. Our representation theorem shows that the space $(\mathcal{S}, W^S)$ constructed in this way is isomorphic to a Wasserstein space over $\mathcal{X} = (\mathcal{A}, d)$. This result establishes a universality property for the Wasserstein distances, revealing them to be uniquely characterized by their discriminative power for finite mixtures. We exploit this representation theorem to propose an estimation methodology based on Kantorovich--Rubenstein duality, and prove a general theorem that shows that its estimation error can be bounded by the sum of the errors of estimating the mixture weights and the mixture components, for any estimators of these quantities. We derive sharp statistical properties for the estimated $W^S$ in the case of $p$-dimensional discrete $K$-mixtures, which we show can be estimated at a rate proportional to $\sqrt{K/N}$, up to logarithmic factors. We complement these bounds with a minimax lower bound on the risk of estimating the Wasserstein distance between distributions on a $K$-point metric space, which matches our upper bound up to logarithmic factors. This result is the first nearly tight minimax lower bound for estimating the Wasserstein distance between discrete distributions. Furthermore, we construct $\sqrt{N}$ asymptotically normal estimators of the mixture weights, and derive a $\sqrt{N}$ distributional limit of our estimator of $W^S$ as a consequence. Simulation studies and a data analysis provide strong support on the applicability of the new Sketched Wasserstein Distance.

Related papers

A Relaxed Wasserstein Distance Formulation for Mixtures of Radially Contoured Distributions [5.876704494595038]
We propose a simple relaxed Wasserstein distance for identifiable mixtures of radially contoured distributions. We show some properties of this distance and that its definition does not require marginal consistency.
arXiv Detail & Related papers (2025-03-18T04:39:31Z)
Summarizing Bayesian Nonparametric Mixture Posterior -- Sliced Optimal Transport Metrics for Gaussian Mixtures [10.694077392690447]
Existing methods to summarize posterior inference for mixture models focus on identifying a point estimate of the implied random partition for clustering. We propose a novel approach for summarizing posterior inference in nonparametric Bayesian mixture models, prioritizing density estimation of the mixing measure (or mixture) as an inference target.
arXiv Detail & Related papers (2024-11-22T02:15:38Z)
Sobolev Space Regularised Pre Density Models [51.558848491038916]
We propose a new approach to non-parametric density estimation that is based on regularizing a Sobolev norm of the density. This method is statistically consistent, and makes the inductive validation model clear and consistent.
arXiv Detail & Related papers (2023-07-25T18:47:53Z)
Centered plug-in estimation of Wasserstein distances [0.0]
The plug-in estimator of the squared Euclidean 2-Wasserstein distance is conservative, however due to its large positive bias it is often uninformative. We construct a pair of centered plug-in estimators that decrease with the true Wasserstein distance, and are therefore guaranteed to be informative, for any finite sample size.
arXiv Detail & Related papers (2022-03-22T11:26:18Z)
Density Ratio Estimation via Infinitesimal Classification [85.08255198145304]
We propose DRE-infty, a divide-and-conquer approach to reduce Density ratio estimation (DRE) to a series of easier subproblems. Inspired by Monte Carlo methods, we smoothly interpolate between the two distributions via an infinite continuum of intermediate bridge distributions. We show that our approach performs well on downstream tasks such as mutual information estimation and energy-based modeling on complex, high-dimensional datasets.
arXiv Detail & Related papers (2021-11-22T06:26:29Z)
Tangent Space and Dimension Estimation with the Wasserstein Distance [10.118241139691952]
Consider a set of points sampled independently near a smooth compact submanifold of Euclidean space. We provide mathematically rigorous bounds on the number of sample points required to estimate both the dimension and the tangent spaces of that manifold.
arXiv Detail & Related papers (2021-10-12T21:02:06Z)
Depth-based pseudo-metrics between probability distributions [1.1470070927586016]
We propose two new pseudo-metrics between continuous probability measures based on data depth and its associated central regions. In contrast to the Wasserstein distance, the proposed pseudo-metrics do not suffer from the curse of dimensionality. The regions-based pseudo-metric appears to be robust w.r.t. both outliers and heavy tails.
arXiv Detail & Related papers (2021-03-23T17:33:18Z)
Continuous Wasserstein-2 Barycenter Estimation without Minimax Optimization [94.18714844247766]
Wasserstein barycenters provide a geometric notion of the weighted average of probability measures based on optimal transport. We present a scalable algorithm to compute Wasserstein-2 barycenters given sample access to the input measures.
arXiv Detail & Related papers (2021-02-02T21:01:13Z)
Two-sample Test using Projected Wasserstein Distance [18.46110328123008]
We develop a projected Wasserstein distance for the two-sample test, a fundamental problem in statistics and machine learning. A key contribution is to couple optimal projection to find the low dimensional linear mapping to maximize the Wasserstein distance between projected probability distributions.
arXiv Detail & Related papers (2020-10-22T18:08:58Z)
On Projection Robust Optimal Transport: Sample Complexity and Model Misspecification [101.0377583883137]
Projection robust (PR) OT seeks to maximize the OT cost between two measures by choosing a $k$-dimensional subspace onto which they can be projected. Our first contribution is to establish several fundamental statistical properties of PR Wasserstein distances. Next, we propose the integral PR Wasserstein (IPRW) distance as an alternative to the PRW distance, by averaging rather than optimizing on subspaces.
arXiv Detail & Related papers (2020-06-22T14:35:33Z)
When OT meets MoM: Robust estimation of Wasserstein Distance [8.812837829361923]
We consider the problem of estimating the Wasserstein distance between two probability distributions when observations are polluted by outliers. We introduce and discuss novel MoM-based robust estimators whose consistency is studied under a data contamination model. We propose a simple MoM-based re-weighting scheme that could be used in conjunction with the Sinkhorn algorithm.
arXiv Detail & Related papers (2020-06-18T07:31:39Z)
Minimax Optimal Estimation of KL Divergence for Continuous Distributions [56.29748742084386]
Esting Kullback-Leibler divergence from identical and independently distributed samples is an important problem in various domains. One simple and effective estimator is based on the k nearest neighbor between these samples.
arXiv Detail & Related papers (2020-02-26T16:37:37Z)
Distributed, partially collapsed MCMC for Bayesian Nonparametrics [68.5279360794418]
We exploit the fact that completely random measures, which commonly used models like the Dirichlet process and the beta-Bernoulli process can be expressed as, are decomposable into independent sub-measures. We use this decomposition to partition the latent measure into a finite measure containing only instantiated components, and an infinite measure containing all other components. The resulting hybrid algorithm can be applied to allow scalable inference without sacrificing convergence guarantees.
arXiv Detail & Related papers (2020-01-15T23:10:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.