Controlling Wasserstein Distances by Kernel Norms with Application to
Compressive Statistical Learning
- URL: http://arxiv.org/abs/2112.00423v3
- Date: Wed, 31 May 2023 09:33:23 GMT
- Title: Controlling Wasserstein Distances by Kernel Norms with Application to
Compressive Statistical Learning
- Authors: Titouan Vayer, R\'emi Gribonval
- Abstract summary: This paper establishes some conditions under which the Wasserstein distance can be controlled by MMD norms.
Inspired by existing results in CSL, we introduce the H"older Lower Restricted Isometric Property and show that this property comes with interesting guarantees for compressive statistical learning.
- Score: 4.873362301533825
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Comparing probability distributions is at the crux of many machine learning
algorithms. Maximum Mean Discrepancies (MMD) and Wasserstein distances are two
classes of distances between probability distributions that have attracted
abundant attention in past years. This paper establishes some conditions under
which the Wasserstein distance can be controlled by MMD norms. Our work is
motivated by the compressive statistical learning (CSL) theory, a general
framework for resource-efficient large scale learning in which the training
data is summarized in a single vector (called sketch) that captures the
information relevant to the considered learning task. Inspired by existing
results in CSL, we introduce the H\"older Lower Restricted Isometric Property
and show that this property comes with interesting guarantees for compressive
statistical learning. Based on the relations between the MMD and the
Wasserstein distances, we provide guarantees for compressive statistical
learning by introducing and studying the concept of Wasserstein regularity of
the learning task, that is when some task-specific metric between probability
distributions can be bounded by a Wasserstein distance.
Related papers
- Fused Gromov-Wasserstein Variance Decomposition with Linear Optimal Transport [11.94799054956877]
We present a decomposition of Fr'echet variance of a set of measures in the 2-Wasserstein space, which allows one to compute the percentage of variance explained by LOT embeddings of those measures.
We also present several experiments that explore the relationship between the dimension of the LOT embedding, the percentage of variance explained, and the classification accuracy of machine learning classifiers built on the embedded data.
arXiv Detail & Related papers (2024-11-15T14:10:52Z) - Near-Optimal Learning and Planning in Separated Latent MDPs [70.88315649628251]
We study computational and statistical aspects of learning Latent Markov Decision Processes (LMDPs)
In this model, the learner interacts with an MDP drawn at the beginning of each epoch from an unknown mixture of MDPs.
arXiv Detail & Related papers (2024-06-12T06:41:47Z) - Prototype-based Aleatoric Uncertainty Quantification for Cross-modal
Retrieval [139.21955930418815]
Cross-modal Retrieval methods build similarity relations between vision and language modalities by jointly learning a common representation space.
However, the predictions are often unreliable due to the Aleatoric uncertainty, which is induced by low-quality data, e.g., corrupt images, fast-paced videos, and non-detailed texts.
We propose a novel Prototype-based Aleatoric Uncertainty Quantification (PAU) framework to provide trustworthy predictions by quantifying the uncertainty arisen from the inherent data ambiguity.
arXiv Detail & Related papers (2023-09-29T09:41:19Z) - Mutual Wasserstein Discrepancy Minimization for Sequential
Recommendation [82.0801585843835]
We propose a novel self-supervised learning framework based on Mutual WasserStein discrepancy minimization MStein for the sequential recommendation.
We also propose a novel contrastive learning loss based on Wasserstein Discrepancy Measurement.
arXiv Detail & Related papers (2023-01-28T13:38:48Z) - Distribution Regression with Sliced Wasserstein Kernels [45.916342378789174]
We propose the first OT-based estimator for distribution regression.
We study the theoretical properties of a kernel ridge regression estimator based on such representation.
arXiv Detail & Related papers (2022-02-08T15:21:56Z) - Density Ratio Estimation via Infinitesimal Classification [85.08255198145304]
We propose DRE-infty, a divide-and-conquer approach to reduce Density ratio estimation (DRE) to a series of easier subproblems.
Inspired by Monte Carlo methods, we smoothly interpolate between the two distributions via an infinite continuum of intermediate bridge distributions.
We show that our approach performs well on downstream tasks such as mutual information estimation and energy-based modeling on complex, high-dimensional datasets.
arXiv Detail & Related papers (2021-11-22T06:26:29Z) - Projected Statistical Methods for Distributional Data on the Real Line
with the Wasserstein Metric [0.0]
We present a novel class of projected methods, to perform statistical analysis on a data set of probability distributions on the real line.
We focus in particular on Principal Component Analysis (PCA) and regression.
Several theoretical properties of the models are investigated and consistency is proven.
arXiv Detail & Related papers (2021-01-22T10:24:49Z) - Two-sample Test using Projected Wasserstein Distance [18.46110328123008]
We develop a projected Wasserstein distance for the two-sample test, a fundamental problem in statistics and machine learning.
A key contribution is to couple optimal projection to find the low dimensional linear mapping to maximize the Wasserstein distance between projected probability distributions.
arXiv Detail & Related papers (2020-10-22T18:08:58Z) - Graph Embedding with Data Uncertainty [113.39838145450007]
spectral-based subspace learning is a common data preprocessing step in many machine learning pipelines.
Most subspace learning methods do not take into consideration possible measurement inaccuracies or artifacts that can lead to data with high uncertainty.
arXiv Detail & Related papers (2020-09-01T15:08:23Z) - On Projection Robust Optimal Transport: Sample Complexity and Model
Misspecification [101.0377583883137]
Projection robust (PR) OT seeks to maximize the OT cost between two measures by choosing a $k$-dimensional subspace onto which they can be projected.
Our first contribution is to establish several fundamental statistical properties of PR Wasserstein distances.
Next, we propose the integral PR Wasserstein (IPRW) distance as an alternative to the PRW distance, by averaging rather than optimizing on subspaces.
arXiv Detail & Related papers (2020-06-22T14:35:33Z) - Schoenberg-Rao distances: Entropy-based and geometry-aware statistical
Hilbert distances [12.729120803225065]
We study a class of statistical Hilbert distances that we term the Schoenberg-Rao distances.
We derive novel closed-form distances between mixtures of Gaussian distributions.
Our method constitutes a practical alternative to Wasserstein distances and we illustrate its efficiency on a broad range of machine learning tasks.
arXiv Detail & Related papers (2020-02-19T18:48:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.