Estimating 2-Sinkhorn Divergence between Gaussian Processes from
Finite-Dimensional Marginals
- URL: http://arxiv.org/abs/2102.03267v1
- Date: Fri, 5 Feb 2021 16:17:55 GMT
- Title: Estimating 2-Sinkhorn Divergence between Gaussian Processes from
Finite-Dimensional Marginals
- Authors: Anton Mallasto
- Abstract summary: We study the convergence of estimating the 2-Sinkhorn divergence between emphGaussian processes (GPs) using their finite-dimensional marginal distributions.
We show almost sure convergence of the divergence when the marginals are sampled according to some base measure.
- Score: 4.416484585765028
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: \emph{Optimal Transport} (OT) has emerged as an important computational tool
in machine learning and computer vision, providing a geometrical framework for
studying probability measures. OT unfortunately suffers from the curse of
dimensionality and requires regularization for practical computations, of which
the \emph{entropic regularization} is a popular choice, which can be
'unbiased', resulting in a \emph{Sinkhorn divergence}. In this work, we study
the convergence of estimating the 2-Sinkhorn divergence between \emph{Gaussian
processes} (GPs) using their finite-dimensional marginal distributions. We show
almost sure convergence of the divergence when the marginals are sampled
according to some base measure. Furthermore, we show that using $n$ marginals
the estimation error of the divergence scales in a dimension-free way as
$\mathcal{O}\left(\epsilon^ {-1}n^{-\frac{1}{2}}\right)$, where $\epsilon$ is
the magnitude of entropic regularization.
Related papers
- Convergence of Unadjusted Langevin in High Dimensions: Delocalization of Bias [13.642712817536072]
We show that as the dimension $d$ of the problem increases, the number of iterations required to ensure convergence within a desired error increases.
A key technical challenge we address is the lack of a one-step contraction property in the $W_2,ellinfty$ metric to measure convergence.
arXiv Detail & Related papers (2024-08-20T01:24:54Z) - Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - Gaussian-Smoothed Sliced Probability Divergences [15.123608776470077]
We show that smoothing and slicing preserve the metric property and the weak topology.
We also derive other properties, including continuity, of different divergences with respect to the smoothing parameter.
arXiv Detail & Related papers (2024-04-04T07:55:46Z) - Efficient Estimation of the Central Mean Subspace via Smoothed Gradient Outer Products [12.047053875716506]
We consider the problem of sufficient dimension reduction for multi-index models.
We show that a fast parametric convergence rate of form $C_d cdot n-1/2$ is achievable.
arXiv Detail & Related papers (2023-12-24T12:28:07Z) - A Unified Framework for Uniform Signal Recovery in Nonlinear Generative
Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously.
Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples.
We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z) - Compressed and distributed least-squares regression: convergence rates
with applications to Federated Learning [9.31522898261934]
We investigate the impact of compression on gradient algorithms for machine learning.
We highlight differences in terms of convergence rates between several unbiased compression operators.
We extend our results to the case of federated learning.
arXiv Detail & Related papers (2023-08-02T18:02:00Z) - Robust computation of optimal transport by $\beta$-potential
regularization [79.24513412588745]
Optimal transport (OT) has become a widely used tool in the machine learning field to measure the discrepancy between probability distributions.
We propose regularizing OT with the beta-potential term associated with the so-called $beta$-divergence.
We experimentally demonstrate that the transport matrix computed with our algorithm helps estimate a probability distribution robustly even in the presence of outliers.
arXiv Detail & Related papers (2022-12-26T18:37:28Z) - Nonparametric approximation of conditional expectation operators [0.3655021726150368]
We investigate the approximation of the $L2$-operator defined by $[Pf](x) := mathbbE[ f(Y) mid X = x ]$ under minimal assumptions.
We prove that $P$ can be arbitrarily well approximated in operator norm by Hilbert-Schmidt operators acting on a reproducing kernel space.
arXiv Detail & Related papers (2020-12-23T19:06:12Z) - Linear Time Sinkhorn Divergences using Positive Features [51.50788603386766]
Solving optimal transport with an entropic regularization requires computing a $ntimes n$ kernel matrix that is repeatedly applied to a vector.
We propose to use instead ground costs of the form $c(x,y)=-logdotpvarphi(x)varphi(y)$ where $varphi$ is a map from the ground space onto the positive orthant $RRr_+$, with $rll n$.
arXiv Detail & Related papers (2020-06-12T10:21:40Z) - Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and
Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP)
We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z) - Debiased Sinkhorn barycenters [110.79706180350507]
Entropy regularization in optimal transport (OT) has been the driver of many recent interests for Wasserstein metrics and barycenters in machine learning.
We show how this bias is tightly linked to the reference measure that defines the entropy regularizer.
We propose debiased Wasserstein barycenters that preserve the best of both worlds: fast Sinkhorn-like iterations without entropy smoothing.
arXiv Detail & Related papers (2020-06-03T23:06:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.