Convergence of Gaussian-smoothed optimal transport distance with
sub-gamma distributions and dependent samples
- URL: http://arxiv.org/abs/2103.00394v1
- Date: Sun, 28 Feb 2021 04:30:23 GMT
- Title: Convergence of Gaussian-smoothed optimal transport distance with
sub-gamma distributions and dependent samples
- Authors: Yixing Zhang, Xiuyuan Cheng, Galen Reeves
- Abstract summary: This paper provides convergence guarantees for estimating the GOT distance under more general settings.
A key step in our analysis is to show that the GOT distance is dominated by a family of kernel maximum mean discrepancy distances.
- Score: 12.77426855794452
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The Gaussian-smoothed optimal transport (GOT) framework, recently proposed by
Goldfeld et al., scales to high dimensions in estimation and provides an
alternative to entropy regularization. This paper provides convergence
guarantees for estimating the GOT distance under more general settings. For the
Gaussian-smoothed $p$-Wasserstein distance in $d$ dimensions, our results
require only the existence of a moment greater than $d + 2p$. For the special
case of sub-gamma distributions, we quantify the dependence on the dimension
$d$ and establish a phase transition with respect to the scale parameter. We
also prove convergence for dependent samples, only requiring a condition on the
pairwise dependence of the samples measured by the covariance of the feature
map of a kernel space.
A key step in our analysis is to show that the GOT distance is dominated by a
family of kernel maximum mean discrepancy (MMD) distances with a kernel that
depends on the cost function as well as the amount of Gaussian smoothing. This
insight provides further interpretability for the GOT framework and also
introduces a class of kernel MMD distances with desirable properties. The
theoretical results are supported by numerical experiments.
Related papers
- Truncated Kernel Stochastic Gradient Descent on Spheres [1.4583059436979549]
Inspired by the structure of spherical harmonics, we propose the truncated kernel gradient descent (T- Kernel SGD) algorithm.
T- Kernel SGD employs a "truncation" operation, enabling the application of series-based kernels function in gradient descent.
In contrast to traditional kernel SGD, T- Kernel SGD is more effective in balancing bias and variance.
arXiv Detail & Related papers (2024-10-02T14:09:51Z) - Relative-Translation Invariant Wasserstein Distance [82.6068808353647]
We introduce a new family of distances, relative-translation invariant Wasserstein distances ($RW_p$)
We show that $RW_p distances are also real distance metrics defined on the quotient set $mathcalP_p(mathbbRn)/sim$ invariant to distribution translations.
arXiv Detail & Related papers (2024-09-04T03:41:44Z) - Semi-Discrete Optimal Transport: Nearly Minimax Estimation With Stochastic Gradient Descent and Adaptive Entropic Regularization [38.67914746910537]
We prove an $mathcalO(t-1)$ lower bound rate for the OT map, using the similarity between Laguerre cells estimation and density support estimation.
To nearly achieve the desired fast rate, we design an entropic regularization scheme decreasing with the number of samples.
arXiv Detail & Related papers (2024-05-23T11:46:03Z) - Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - Estimation of entropy-regularized optimal transport maps between
non-compactly supported measures [15.857723276537248]
This paper addresses the problem of estimating entropy-regularized optimal transport maps with squared-Euclidean cost between source and target measures that are subGaussian.
arXiv Detail & Related papers (2023-11-20T17:18:21Z) - A High-dimensional Convergence Theorem for U-statistics with
Applications to Kernel-based Testing [3.469038201881982]
We prove a convergence theorem for U-statistics of degree two, where the data dimension $d$ is allowed to scale with sample size $n$.
We apply our theory to two popular kernel-based distribution tests, MMD and KSD, whose high-dimensional performance has been challenging to study.
arXiv Detail & Related papers (2023-02-11T12:49:46Z) - Optimal 1-Wasserstein Distance for WGANs [2.1174215880331775]
We provide a thorough analysis of Wasserstein GANs (WGANs) in both the finite sample and regimes.
We derive in passing new results on optimal transport theory in the semi-discrete setting.
arXiv Detail & Related papers (2022-01-08T13:04:03Z) - Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process.
We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator.
We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z) - A Note on Optimizing Distributions using Kernel Mean Embeddings [94.96262888797257]
Kernel mean embeddings represent probability measures by their infinite-dimensional mean embeddings in a reproducing kernel Hilbert space.
We show that when the kernel is characteristic, distributions with a kernel sum-of-squares density are dense.
We provide algorithms to optimize such distributions in the finite-sample setting.
arXiv Detail & Related papers (2021-06-18T08:33:45Z) - On Projection Robust Optimal Transport: Sample Complexity and Model
Misspecification [101.0377583883137]
Projection robust (PR) OT seeks to maximize the OT cost between two measures by choosing a $k$-dimensional subspace onto which they can be projected.
Our first contribution is to establish several fundamental statistical properties of PR Wasserstein distances.
Next, we propose the integral PR Wasserstein (IPRW) distance as an alternative to the PRW distance, by averaging rather than optimizing on subspaces.
arXiv Detail & Related papers (2020-06-22T14:35:33Z) - The Convergence Indicator: Improved and completely characterized
parameter bounds for actual convergence of Particle Swarm Optimization [68.8204255655161]
We introduce a new convergence indicator that can be used to calculate whether the particles will finally converge to a single point or diverge.
Using this convergence indicator we provide the actual bounds completely characterizing parameter regions that lead to a converging swarm.
arXiv Detail & Related papers (2020-06-06T19:08:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.