Robust Estimation under the Wasserstein Distance
- URL: http://arxiv.org/abs/2302.01237v1
- Date: Thu, 2 Feb 2023 17:20:25 GMT
- Title: Robust Estimation under the Wasserstein Distance
- Authors: Sloan Nietert, Rachel Cummings, and Ziv Goldfeld
- Abstract summary: We introduce a new outlier-robust Wasserstein distance $mathsfW_pvarepsilon$ which allows for $varepsilon$ outlier mass to be removed from its input distributions.
We show that minimum distance estimation under $mathsfW_pvarepsilon$ achieves minimax optimal robust estimation risk.
- Score: 28.792608997509376
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of robust distribution estimation under the Wasserstein
metric, a popular discrepancy measure between probability distributions rooted
in optimal transport (OT) theory. We introduce a new outlier-robust Wasserstein
distance $\mathsf{W}_p^\varepsilon$ which allows for $\varepsilon$ outlier mass
to be removed from its input distributions, and show that minimum distance
estimation under $\mathsf{W}_p^\varepsilon$ achieves minimax optimal robust
estimation risk. Our analysis is rooted in several new results for partial OT,
including an approximate triangle inequality, which may be of independent
interest. To address computational tractability, we derive a dual formulation
for $\mathsf{W}_p^\varepsilon$ that adds a simple penalty term to the classic
Kantorovich dual objective. As such, $\mathsf{W}_p^\varepsilon$ can be
implemented via an elementary modification to standard, duality-based OT
solvers. Our results are extended to sliced OT, where distributions are
projected onto low-dimensional subspaces, and applications to homogeneity and
independence testing are explored. We illustrate the virtues of our framework
via applications to generative modeling with contaminated datasets.
Related papers
- Private Mean Estimation with Person-Level Differential Privacy [6.621676316292624]
We study person-level differentially private mean estimation in the case where each person holds multiple samples.
We give computationally efficient algorithms under approximate-DP and computationally inefficient algorithms under pure DP, and our nearly matching lower bounds hold for the most permissive case of approximate DP.
arXiv Detail & Related papers (2024-05-30T18:20:35Z) - Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - Learning Intersections of Halfspaces with Distribution Shift: Improved Algorithms and SQ Lower Bounds [9.036777309376697]
Klivans, Stavropoulos, and Vasilyan initiated the study of testable learning with distribution shift.
Their model deviates from all prior work in that no assumptions are made on $mathcalD'$.
arXiv Detail & Related papers (2024-04-02T23:34:39Z) - Near Minimax-Optimal Distributional Temporal Difference Algorithms and The Freedman Inequality in Hilbert Spaces [24.03281329962804]
We propose a non-parametric distributional TD algorithm (NTD) for a $gamma$-discounted infinite-horizon Markov decision process.
We establish a novel Freedman's inequality in Hilbert spaces, which would be of independent interest.
arXiv Detail & Related papers (2024-03-09T06:19:53Z) - Robust computation of optimal transport by $\beta$-potential
regularization [79.24513412588745]
Optimal transport (OT) has become a widely used tool in the machine learning field to measure the discrepancy between probability distributions.
We propose regularizing OT with the beta-potential term associated with the so-called $beta$-divergence.
We experimentally demonstrate that the transport matrix computed with our algorithm helps estimate a probability distribution robustly even in the presence of outliers.
arXiv Detail & Related papers (2022-12-26T18:37:28Z) - Near Sample-Optimal Reduction-based Policy Learning for Average Reward
MDP [58.13930707612128]
This work considers the sample complexity of obtaining an $varepsilon$-optimal policy in an average reward Markov Decision Process (AMDP)
We prove an upper bound of $widetilde O(H varepsilon-3 ln frac1delta)$ samples per state-action pair, where $H := sp(h*)$ is the span of bias of any optimal policy, $varepsilon$ is the accuracy and $delta$ is the failure probability.
arXiv Detail & Related papers (2022-12-01T15:57:58Z) - The Sketched Wasserstein Distance for mixture distributions [13.643197515573029]
Sketched Wasserstein Distance ($WS$) is a new probability distance specifically tailored to finite mixture distributions.
We show that $WS$ is defined to be the most discriminative of this metric to the space $mathcalS = textrmconv(mathcalA)$ of mixtures of elements of $mathcalA$.
arXiv Detail & Related papers (2022-06-26T02:33:40Z) - A Law of Robustness beyond Isoperimetry [84.33752026418045]
We prove a Lipschitzness lower bound $Omega(sqrtn/p)$ of robustness of interpolating neural network parameters on arbitrary distributions.
We then show the potential benefit of overparametrization for smooth data when $n=mathrmpoly(d)$.
We disprove the potential existence of an $O(1)$-Lipschitz robust interpolating function when $n=exp(omega(d))$.
arXiv Detail & Related papers (2022-02-23T16:10:23Z) - Outlier-Robust Optimal Transport: Duality, Structure, and Statistical
Applications [25.410110072480187]
Wasserstein distances are sensitive to outliers in the considered distributions.
We propose a new outlier-robust Wasserstein distance $mathsfW_pvarepsilon$ which allows for $varepsilon$ outlier mass to be removed from each contaminated distribution.
arXiv Detail & Related papers (2021-11-02T04:05:45Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z) - On Projection Robust Optimal Transport: Sample Complexity and Model
Misspecification [101.0377583883137]
Projection robust (PR) OT seeks to maximize the OT cost between two measures by choosing a $k$-dimensional subspace onto which they can be projected.
Our first contribution is to establish several fundamental statistical properties of PR Wasserstein distances.
Next, we propose the integral PR Wasserstein (IPRW) distance as an alternative to the PRW distance, by averaging rather than optimizing on subspaces.
arXiv Detail & Related papers (2020-06-22T14:35:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.