From Smooth Wasserstein Distance to Dual Sobolev Norm: Empirical
Approximation and Statistical Applications
- URL: http://arxiv.org/abs/2101.04039v2
- Date: Thu, 14 Jan 2021 18:39:00 GMT
- Title: From Smooth Wasserstein Distance to Dual Sobolev Norm: Empirical
Approximation and Statistical Applications
- Authors: Sloan Nietert, Ziv Goldfeld, Kengo Kato
- Abstract summary: We show that $mathsfW_p(sigma)$ is controlled by a $pth order smooth dual Sobolev $mathsfd_p(sigma)$.
We derive the limit distribution of $sqrtnmathsfd_p(sigma)(hatmu_n,mu)$ in all dimensions $d$, when $mu$ is sub-Gaussian.
- Score: 18.618590805279187
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Statistical distances, i.e., discrepancy measures between probability
distributions, are ubiquitous in probability theory, statistics and machine
learning. To combat the curse of dimensionality when estimating these distances
from data, recent work has proposed smoothing out local irregularities in the
measured distributions via convolution with a Gaussian kernel. Motivated by the
scalability of the smooth framework to high dimensions, we conduct an in-depth
study of the structural and statistical behavior of the Gaussian-smoothed
$p$-Wasserstein distance $\mathsf{W}_p^{(\sigma)}$, for arbitrary $p\geq 1$. We
start by showing that $\mathsf{W}_p^{(\sigma)}$ admits a metric structure that
is topologically equivalent to classic $\mathsf{W}_p$ and is stable with
respect to perturbations in $\sigma$. Moving to statistical questions, we
explore the asymptotic properties of
$\mathsf{W}_p^{(\sigma)}(\hat{\mu}_n,\mu)$, where $\hat{\mu}_n$ is the
empirical distribution of $n$ i.i.d. samples from $\mu$. To that end, we prove
that $\mathsf{W}_p^{(\sigma)}$ is controlled by a $p$th order smooth dual
Sobolev norm $\mathsf{d}_p^{(\sigma)}$. Since
$\mathsf{d}_p^{(\sigma)}(\hat{\mu}_n,\mu)$ coincides with the supremum of an
empirical process indexed by Gaussian-smoothed Sobolev functions, it lends
itself well to analysis via empirical process theory. We derive the limit
distribution of $\sqrt{n}\mathsf{d}_p^{(\sigma)}(\hat{\mu}_n,\mu)$ in all
dimensions $d$, when $\mu$ is sub-Gaussian. Through the aforementioned bound,
this implies a parametric empirical convergence rate of $n^{-1/2}$ for
$\mathsf{W}_p^{(\sigma)}$, contrasting the $n^{-1/d}$ rate for unsmoothed
$\mathsf{W}_p$ when $d \geq 3$. As applications, we provide asymptotic
guarantees for two-sample testing and minimum distance estimation. When $p=2$,
we further show that $\mathsf{d}_2^{(\sigma)}$ can be expressed as a maximum
mean discrepancy.
Related papers
- Dimension-free Private Mean Estimation for Anisotropic Distributions [55.86374912608193]
Previous private estimators on distributions over $mathRd suffer from a curse of dimensionality.
We present an algorithm whose sample complexity has improved dependence on dimension.
arXiv Detail & Related papers (2024-11-01T17:59:53Z) - Improved convergence rate of kNN graph Laplacians [11.93971616098517]
General class of $k$NN graph where the graph affinity is $W_ij = epsilon-d/2 .
We prove the point-wise convergence of the $k$NN graph Laplacian to the limiting manifold operator.
arXiv Detail & Related papers (2024-10-30T17:01:00Z) - Estimation and Inference in Distributional Reinforcement Learning [28.253677740976197]
We show that a dataset of size $widetilde Oleft(frac|mathcalS||mathcalA|epsilon2 (1-gamma)4right)$ suffices to ensure the Kolmogorov metric and total variation metric between $hatetapi$ and $etapi$ is below $epsilon$ with high probability.
Our findings give rise to a unified approach to statistical inference of a wide class of statistical functionals of $etapi$.
arXiv Detail & Related papers (2023-09-29T14:14:53Z) - A Unified Framework for Uniform Signal Recovery in Nonlinear Generative
Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously.
Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples.
We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Structure Learning in Graphical Models from Indirect Observations [17.521712510832558]
This paper considers learning of the graphical structure of a $p$-dimensional random vector $X in Rp$ using both parametric and non-parametric methods.
Under mild conditions, we show that our graph-structure estimator can obtain the correct structure.
arXiv Detail & Related papers (2022-05-06T19:24:44Z) - Random matrices in service of ML footprint: ternary random features with
no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$.
We propose a novel random technique, called Ternary Random Feature (TRF)
The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z) - Nonasymptotic one-and two-sample tests in high dimension with unknown
covariance structure [0.0]
We consider the problem of testing if $mu is $eta-close to zero, i.e. $|mu| leq eta against $|mu| geq (eta + delta)$.
The aim of this paper is to obtain nonasymptotic upper and lower bounds on the minimal separation distancedelta$ such that we can control both the Type I and Type II errors at a given level.
arXiv Detail & Related papers (2021-09-01T06:22:53Z) - Non-Parametric Estimation of Manifolds from Noisy Data [1.0152838128195467]
We consider the problem of estimating a $d$ dimensional sub-manifold of $mathbbRD$ from a finite set of noisy samples.
We show that the estimation yields rates of convergence of $n-frack2k + d$ for the point estimation and $n-frack-12k + d$ for the estimation of tangent space.
arXiv Detail & Related papers (2021-05-11T02:29:33Z) - Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z) - Curse of Dimensionality on Randomized Smoothing for Certifiable
Robustness [151.67113334248464]
We show that extending the smoothing technique to defend against other attack models can be challenging.
We present experimental results on CIFAR to validate our theory.
arXiv Detail & Related papers (2020-02-08T22:02:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.