Outlier-Robust Optimal Transport: Duality, Structure, and Statistical
Applications
- URL: http://arxiv.org/abs/2111.01361v1
- Date: Tue, 2 Nov 2021 04:05:45 GMT
- Title: Outlier-Robust Optimal Transport: Duality, Structure, and Statistical
Applications
- Authors: Sloan Nietert, Rachel Cummings, Ziv Goldfeld
- Abstract summary: Wasserstein distances are sensitive to outliers in the considered distributions.
We propose a new outlier-robust Wasserstein distance $mathsfW_pvarepsilon$ which allows for $varepsilon$ outlier mass to be removed from each contaminated distribution.
- Score: 25.410110072480187
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Wasserstein distance, rooted in optimal transport (OT) theory, is a
popular discrepancy measure between probability distributions with various
applications to statistics and machine learning. Despite their rich structure
and demonstrated utility, Wasserstein distances are sensitive to outliers in
the considered distributions, which hinders applicability in practice. Inspired
by the Huber contamination model, we propose a new outlier-robust Wasserstein
distance $\mathsf{W}_p^\varepsilon$ which allows for $\varepsilon$ outlier mass
to be removed from each contaminated distribution. Our formulation amounts to a
highly regular optimization problem that lends itself better for analysis
compared to previously considered frameworks. Leveraging this, we conduct a
thorough theoretical study of $\mathsf{W}_p^\varepsilon$, encompassing
characterization of optimal perturbations, regularity, duality, and statistical
estimation and robustness results. In particular, by decoupling the
optimization variables, we arrive at a simple dual form for
$\mathsf{W}_p^\varepsilon$ that can be implemented via an elementary
modification to standard, duality-based OT solvers. We illustrate the benefits
of our framework via applications to generative modeling with contaminated
datasets.
Related papers
- A Sharp Convergence Theory for The Probability Flow ODEs of Diffusion Models [45.60426164657739]
We develop non-asymptotic convergence theory for a diffusion-based sampler.
We prove that $d/varepsilon$ are sufficient to approximate the target distribution to within $varepsilon$ total-variation distance.
Our results also characterize how $ell$ score estimation errors affect the quality of the data generation processes.
arXiv Detail & Related papers (2024-08-05T09:02:24Z) - Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions.
We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance.
Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z) - Distribution learning via neural differential equations: a nonparametric
statistical perspective [1.4436965372953483]
This work establishes the first general statistical convergence analysis for distribution learning via ODE models trained through likelihood transformations.
We show that the latter can be quantified via the $C1$-metric entropy of the class $mathcal F$.
We then apply this general framework to the setting of $Ck$-smooth target densities, and establish nearly minimax-optimal convergence rates for two relevant velocity field classes $mathcal F$: $Ck$ functions and neural networks.
arXiv Detail & Related papers (2023-09-03T00:21:37Z) - Towards Faster Non-Asymptotic Convergence for Diffusion-Based Generative
Models [49.81937966106691]
We develop a suite of non-asymptotic theory towards understanding the data generation process of diffusion models.
In contrast to prior works, our theory is developed based on an elementary yet versatile non-asymptotic approach.
arXiv Detail & Related papers (2023-06-15T16:30:08Z) - Robust Estimation under the Wasserstein Distance [27.382258439576606]
Given $n$ samples, of which $varepsilon n$ are adversarially corrupted, we seek an estimate for $mu$ with minimal Wasserstein error.
We prove new structural properties for POT and use them to show that MDE under a partial Wasserstein distance achieves the minimax-optimal robust estimation risk.
Since the popular Wasserstein generative adversarial network (WGAN) framework implements Wasserstein MDE via Kantorovich duality, our penalized dual enables large-scale generative modeling with contaminated datasets.
arXiv Detail & Related papers (2023-02-02T17:20:25Z) - Robust computation of optimal transport by $\beta$-potential
regularization [79.24513412588745]
Optimal transport (OT) has become a widely used tool in the machine learning field to measure the discrepancy between probability distributions.
We propose regularizing OT with the beta-potential term associated with the so-called $beta$-divergence.
We experimentally demonstrate that the transport matrix computed with our algorithm helps estimate a probability distribution robustly even in the presence of outliers.
arXiv Detail & Related papers (2022-12-26T18:37:28Z) - Wasserstein Distributional Learning [5.830831796910439]
Wasserstein Distributional Learning (WDL) is a flexible density-on-scalar regression modeling framework.
We show that WDL better characterizes and uncovers the nonlinear dependence of the conditional densities.
We demonstrate the effectiveness of the WDL framework through simulations and real-world applications.
arXiv Detail & Related papers (2022-09-12T02:32:17Z) - A Unified Framework for Multi-distribution Density Ratio Estimation [101.67420298343512]
Binary density ratio estimation (DRE) provides the foundation for many state-of-the-art machine learning algorithms.
We develop a general framework from the perspective of Bregman minimization divergence.
We show that our framework leads to methods that strictly generalize their counterparts in binary DRE.
arXiv Detail & Related papers (2021-12-07T01:23:20Z) - Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples.
We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z) - On Projection Robust Optimal Transport: Sample Complexity and Model
Misspecification [101.0377583883137]
Projection robust (PR) OT seeks to maximize the OT cost between two measures by choosing a $k$-dimensional subspace onto which they can be projected.
Our first contribution is to establish several fundamental statistical properties of PR Wasserstein distances.
Next, we propose the integral PR Wasserstein (IPRW) distance as an alternative to the PRW distance, by averaging rather than optimizing on subspaces.
arXiv Detail & Related papers (2020-06-22T14:35:33Z) - A Precise High-Dimensional Asymptotic Theory for Boosting and
Minimum-$\ell_1$-Norm Interpolated Classifiers [3.167685495996986]
This paper establishes a precise high-dimensional theory for boosting on separable data.
Under a class of statistical models, we provide an exact analysis of the universality error of boosting.
We also explicitly pin down the relation between the boosting test error and the optimal Bayes error.
arXiv Detail & Related papers (2020-02-05T00:24:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.