Limit Distribution Theory for the Smooth 1-Wasserstein Distance with
Applications
- URL: http://arxiv.org/abs/2107.13494v2
- Date: Thu, 29 Jul 2021 13:39:02 GMT
- Title: Limit Distribution Theory for the Smooth 1-Wasserstein Distance with
Applications
- Authors: Ritwik Sadhu and Ziv Goldfeld and Kengo Kato
- Abstract summary: smooth 1-Wasserstein distance (SWD) $W_1sigma$ was recently proposed as a means to mitigate the curse of dimensionality in empirical approximation.
This work conducts a thorough statistical study of the SWD, including a high-dimensional limit distribution result.
- Score: 18.618590805279187
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The smooth 1-Wasserstein distance (SWD) $W_1^\sigma$ was recently proposed as
a means to mitigate the curse of dimensionality in empirical approximation
while preserving the Wasserstein structure. Indeed, SWD exhibits parametric
convergence rates and inherits the metric and topological structure of the
classic Wasserstein distance. Motivated by the above, this work conducts a
thorough statistical study of the SWD, including a high-dimensional limit
distribution result for empirical $W_1^\sigma$, bootstrap consistency,
concentration inequalities, and Berry-Esseen type bounds. The derived
nondegenerate limit stands in sharp contrast with the classic empirical $W_1$,
for which a similar result is known only in the one-dimensional case. We also
explore asymptotics and characterize the limit distribution when the smoothing
parameter $\sigma$ is scaled with $n$, converging to $0$ at a sufficiently slow
rate. The dimensionality of the sampled distribution enters empirical SWD
convergence bounds only through the prefactor (i.e., the constant). We provide
a sharp characterization of this prefactor's dependence on the smoothing
parameter and the intrinsic dimension. This result is then used to derive new
empirical convergence rates for classic $W_1$ in terms of the intrinsic
dimension. As applications of the limit distribution theory, we study
two-sample testing and minimum distance estimation (MDE) under $W_1^\sigma$. We
establish asymptotic validity of SWD testing, while for MDE, we prove
measurability, almost sure convergence, and limit distributions for optimal
estimators and their corresponding $W_1^\sigma$ error. Our results suggest that
the SWD is well suited for high-dimensional statistical learning and inference.
Related papers
- Non-asymptotic bounds for forward processes in denoising diffusions: Ornstein-Uhlenbeck is hard to beat [49.1574468325115]
This paper presents explicit non-asymptotic bounds on the forward diffusion error in total variation (TV)
We parametrise multi-modal data distributions in terms of the distance $R$ to their furthest modes and consider forward diffusions with additive and multiplicative noise.
arXiv Detail & Related papers (2024-08-25T10:28:31Z) - Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - Gaussian-Smoothed Sliced Probability Divergences [15.123608776470077]
We show that smoothing and slicing preserve the metric property and the weak topology.
We also derive other properties, including continuity, of different divergences with respect to the smoothing parameter.
arXiv Detail & Related papers (2024-04-04T07:55:46Z) - Statistical Efficiency of Distributional Temporal Difference Learning [24.03281329962804]
We analyze the finite-sample performance of distributional temporal difference learning (CTD) and quantile temporal difference learning (QTD)
For a $gamma$-discounted infinite-horizon decision process, we show that for NTD we need $tildeOleft(frac1varepsilon2p (1-gamma)2pright)$ to achieve an $varepsilon$-optimal estimator with high probability.
We establish a novel Freedman's inequality in Hilbert spaces, which would be of independent interest.
arXiv Detail & Related papers (2024-03-09T06:19:53Z) - Estimation of entropy-regularized optimal transport maps between
non-compactly supported measures [15.857723276537248]
This paper addresses the problem of estimating entropy-regularized optimal transport maps with squared-Euclidean cost between source and target measures that are subGaussian.
arXiv Detail & Related papers (2023-11-20T17:18:21Z) - Towards Faster Non-Asymptotic Convergence for Diffusion-Based Generative
Models [49.81937966106691]
We develop a suite of non-asymptotic theory towards understanding the data generation process of diffusion models.
In contrast to prior works, our theory is developed based on an elementary yet versatile non-asymptotic approach.
arXiv Detail & Related papers (2023-06-15T16:30:08Z) - A High-dimensional Convergence Theorem for U-statistics with
Applications to Kernel-based Testing [3.469038201881982]
We prove a convergence theorem for U-statistics of degree two, where the data dimension $d$ is allowed to scale with sample size $n$.
We apply our theory to two popular kernel-based distribution tests, MMD and KSD, whose high-dimensional performance has been challenging to study.
arXiv Detail & Related papers (2023-02-11T12:49:46Z) - High Probability Bounds for a Class of Nonconvex Algorithms with AdaGrad
Stepsize [55.0090961425708]
We propose a new, simplified high probability analysis of AdaGrad for smooth, non- probability problems.
We present our analysis in a modular way and obtain a complementary $mathcal O (1 / TT)$ convergence rate in the deterministic setting.
To the best of our knowledge, this is the first high probability for AdaGrad with a truly adaptive scheme, i.e., completely oblivious to the knowledge of smoothness.
arXiv Detail & Related papers (2022-04-06T13:50:33Z) - Dimension-agnostic inference using cross U-statistics [33.17951971728784]
We introduce an approach that uses variational representations of existing test statistics along with sample splitting and self-normalization.
The resulting statistic can be viewed as a careful modification of degenerate U-statistics, dropping diagonal blocks and retaining off-diagonal blocks.
arXiv Detail & Related papers (2020-11-10T12:21:34Z) - On Projection Robust Optimal Transport: Sample Complexity and Model
Misspecification [101.0377583883137]
Projection robust (PR) OT seeks to maximize the OT cost between two measures by choosing a $k$-dimensional subspace onto which they can be projected.
Our first contribution is to establish several fundamental statistical properties of PR Wasserstein distances.
Next, we propose the integral PR Wasserstein (IPRW) distance as an alternative to the PRW distance, by averaging rather than optimizing on subspaces.
arXiv Detail & Related papers (2020-06-22T14:35:33Z) - Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and
Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP)
We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.