Efficient Distribution Learning with Error Bounds in Wasserstein Distance
- URL: http://arxiv.org/abs/2602.08063v1
- Date: Sun, 08 Feb 2026 17:14:30 GMT
- Title: Efficient Distribution Learning with Error Bounds in Wasserstein Distance
- Authors: Eduardo Figueiredo, Steven Adams, Luca Laurenti,
- Abstract summary: Wasserstein distance has emerged as a key metric to quantify distances between probability distributions.<n>We devise a novel framework to approximate an unknown probability distribution $mathbbP$ from a finite set of samples.
- Score: 5.3077298055859545
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Wasserstein distance has emerged as a key metric to quantify distances between probability distributions, with applications in various fields, including machine learning, control theory, decision theory, and biological systems. Consequently, learning an unknown distribution with non-asymptotic and easy-to-compute error bounds in Wasserstein distance has become a fundamental problem in many fields. In this paper, we devise a novel algorithmic and theoretical framework to approximate an unknown probability distribution $\mathbb{P}$ from a finite set of samples by an approximate discrete distribution $\widehat{\mathbb{P}}$ while bounding the Wasserstein distance between $\mathbb{P}$ and $\widehat{\mathbb{P}}$. Our framework leverages optimal transport, nonlinear optimization, and concentration inequalities. In particular, we show that, even if $\mathbb{P}$ is unknown, the Wasserstein distance between $\mathbb{P}$ and $\widehat{\mathbb{P}}$ can be efficiently bounded with high confidence by solving a tractable optimization problem (a mixed integer linear program) of a size that only depends on the size of the support of $\widehat{\mathbb{P}}$. This enables us to develop intelligent clustering algorithms to optimally find the support of $\widehat{\mathbb{P}}$ while minimizing the Wasserstein distance error. On a set of benchmarks, we demonstrate that our approach outperforms state-of-the-art comparable methods by generally returning approximating distributions with substantially smaller support and tighter error bounds.
Related papers
- Low-degree lower bounds via almost orthonormal bases [47.83594448785856]
Low-degrees have emerged as a powerful paradigm for providing evidence of statistical--computational gaps across a variety of high-dimensional statistical models.<n>In this work, we propose a more direct proof strategy.
arXiv Detail & Related papers (2025-09-11T11:07:36Z) - Complexity Analysis of Normalizing Constant Estimation: from Jarzynski Equality to Annealed Importance Sampling and beyond [40.79840141270367]
Given an unnormalized probability density $piproptomathrme-V$, estimating its normalizing constant $Z=int_mathbbRdmathrme-V(x)mathrmdx$ or free energy $F=-log Z$ is a crucial problem in Bayesian statistics, statistical mechanics, and machine learning.<n>We propose a new algorithm based on reverse diffusion samplers, establish a framework for analyzing its complexity, and empirically demonstrate its efficiency in tackling multimodality.
arXiv Detail & Related papers (2025-02-07T00:05:28Z) - On lower bounds of the density of planar periodic sets without unit distances [55.2480439325792]
We introduce a novel approach to estimating $m_1(mathbbR2)$ by reformulating the problem as a Maximal Independent Set (MIS) problem on graphs constructed from flat torus.<n>Our experimental results, supported by theoretical justifications of proposed method, demonstrate that for a sufficiently wide range of parameters this approach does not improve the known lower bound.
arXiv Detail & Related papers (2024-11-20T12:07:19Z) - Relative-Translation Invariant Wasserstein Distance [82.6068808353647]
We introduce a new family of distances, relative-translation invariant Wasserstein distances ($RW_p$)
We show that $RW_p distances are also real distance metrics defined on the quotient set $mathcalP_p(mathbbRn)/sim$ invariant to distribution translations.
arXiv Detail & Related papers (2024-09-04T03:41:44Z) - Instance-Optimal Private Density Estimation in the Wasserstein Distance [37.58527481568219]
Estimating the density of a distribution from samples is a fundamental problem in statistics.
We study differentially private density estimation in the Wasserstein distance.
arXiv Detail & Related papers (2024-06-27T22:51:06Z) - Robust Distribution Learning with Local and Global Adversarial Corruptions [17.22168727622332]
We develop an efficient finite-sample algorithm with error bounded by $sqrtvarepsilon k + rho + tildeO(dsqrtkn-1/(k lor 2))$ when $P$ has bounded covariance.
Our efficient procedure relies on a novel trace norm approximation of an ideal yet intractable 2-Wasserstein projection estimator.
arXiv Detail & Related papers (2024-06-10T17:48:36Z) - Pseudonorm Approachability and Applications to Regret Minimization [73.54127663296906]
We convert high-dimensional $ell_infty$-approachability problems to low-dimensional pseudonorm approachability problems.
We develop an algorithmic theory of pseudonorm approachability, analogous to previous work on approachability for $ell$ and other norms.
arXiv Detail & Related papers (2023-02-03T03:19:14Z) - Robust Estimation under the Wasserstein Distance [27.382258439576606]
Given $n$ samples, of which $varepsilon n$ are adversarially corrupted, we seek an estimate for $mu$ with minimal Wasserstein error.
We prove new structural properties for POT and use them to show that MDE under a partial Wasserstein distance achieves the minimax-optimal robust estimation risk.
Since the popular Wasserstein generative adversarial network (WGAN) framework implements Wasserstein MDE via Kantorovich duality, our penalized dual enables large-scale generative modeling with contaminated datasets.
arXiv Detail & Related papers (2023-02-02T17:20:25Z) - Simple Binary Hypothesis Testing under Local Differential Privacy and
Communication Constraints [8.261182037130407]
We study simple binary hypothesis testing under both local differential privacy (LDP) and communication constraints.
We qualify our results as either minimax optimal or instance optimal.
arXiv Detail & Related papers (2023-01-09T18:36:49Z) - Estimation and inference for the Wasserstein distance between mixing measures in topic models [18.66039789963639]
The Wasserstein distance between mixing measures has come to occupy a central place in the statistical analysis of mixture models.
This work proposes a new canonical interpretation of this distance and provides tools to perform inference on the Wasserstein distance in topic models.
arXiv Detail & Related papers (2022-06-26T02:33:40Z) - Hardness of Learning Halfspaces with Massart Noise [56.98280399449707]
We study the complexity of PAC learning halfspaces in the presence of Massart (bounded) noise.
We show that there an exponential gap between the information-theoretically optimal error and the best error that can be achieved by a SQ algorithm.
arXiv Detail & Related papers (2020-12-17T16:43:11Z) - A diffusion approach to Stein's method on Riemannian manifolds [65.36007959755302]
We exploit the relationship between the generator of a diffusion on $mathbf M$ with target invariant measure and its characterising Stein operator.
We derive Stein factors, which bound the solution to the Stein equation and its derivatives.
We imply that the bounds for $mathbb Rm$ remain valid when $mathbf M$ is a flat manifold.
arXiv Detail & Related papers (2020-03-25T17:03:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.