MMD-Regularized Unbalanced Optimal Transport
- URL: http://arxiv.org/abs/2011.05001v9
- Date: Wed, 24 Jan 2024 16:23:21 GMT
- Title: MMD-Regularized Unbalanced Optimal Transport
- Authors: Piyushi Manupriya (IIT Hyderabad, INDIA), J. Saketha Nath (IIT
Hyderabad, INDIA), Pratik Jawanpuria (Microsoft IDC, INDIA)
- Abstract summary: We study the unbalanced optimal transport (UOT) problem, where the marginal constraints are enforced using Maximum Mean Discrepancy (MMD) regularization.
Our work is motivated by the observation that the literature on UOT is focused on regularization based on $phi$-divergence.
Despite the popularity of MMD, its role as a regularizer in the context of UOT seems less understood.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the unbalanced optimal transport (UOT) problem, where the marginal
constraints are enforced using Maximum Mean Discrepancy (MMD) regularization.
Our work is motivated by the observation that the literature on UOT is focused
on regularization based on $\phi$-divergence (e.g., KL divergence). Despite the
popularity of MMD, its role as a regularizer in the context of UOT seems less
understood. We begin by deriving a specific dual of MMD-regularized UOT
(MMD-UOT), which helps us prove several useful properties. One interesting
outcome of this duality result is that MMD-UOT induces novel metrics, which not
only lift the ground metric like the Wasserstein but are also sample-wise
efficient to estimate like the MMD. Further, for real-world applications
involving non-discrete measures, we present an estimator for the transport plan
that is supported only on the given ($m$) samples. Under certain conditions, we
prove that the estimation error with this finitely-supported transport plan is
also $\mathcal{O}(1/\sqrt{m})$. As far as we know, such error bounds that are
free from the curse of dimensionality are not known for $\phi$-divergence
regularized UOT. Finally, we discuss how the proposed estimator can be computed
efficiently using accelerated gradient descent. Our experiments show that
MMD-UOT consistently outperforms popular baselines, including KL-regularized
UOT and MMD, in diverse machine learning applications. Our codes are publicly
available at https://github.com/Piyushi-0/MMD-reg-OT
Related papers
- (De)-regularized Maximum Mean Discrepancy Gradient Flow [27.70783952195201]
We introduce a (de)-regularization of the Maximum Mean Discrepancy (DrMMD) and its Wasserstein gradient flow.
DrMMD flow can simultaneously guarantee near-global convergence for a broad class of targets in both continuous and discrete time.
Our numerical scheme uses an adaptive de-regularization schedule throughout the flow to optimally trade off between discretization errors and deviations from the $chi2$ regime.
arXiv Detail & Related papers (2024-09-23T12:57:42Z) - Partial identification of kernel based two sample tests with mismeasured
data [5.076419064097733]
Two-sample tests such as the Maximum Mean Discrepancy (MMD) are often used to detect differences between two distributions in machine learning applications.
We study the estimation of the MMD under $epsilon$-contamination, where a possibly non-random $epsilon$ proportion of one distribution is erroneously grouped with the other.
We propose a method to estimate these bounds, and show that it gives estimates that converge to the sharpest possible bounds on the MMD as sample size increases.
arXiv Detail & Related papers (2023-08-07T13:21:58Z) - Generative Sliced MMD Flows with Riesz Kernels [0.393259574660092]
Maximum mean discrepancy (MMD) flows suffer from high computational costs in large scale computations.
We show that MMD flows with Riesz kernels $K(x,y) = - |x-y|r$, $r in (0,2)$ have exceptional properties which allow their efficient computation.
arXiv Detail & Related papers (2023-05-19T06:33:57Z) - A High-dimensional Convergence Theorem for U-statistics with
Applications to Kernel-based Testing [3.469038201881982]
We prove a convergence theorem for U-statistics of degree two, where the data dimension $d$ is allowed to scale with sample size $n$.
We apply our theory to two popular kernel-based distribution tests, MMD and KSD, whose high-dimensional performance has been challenging to study.
arXiv Detail & Related papers (2023-02-11T12:49:46Z) - Robust computation of optimal transport by $\beta$-potential
regularization [79.24513412588745]
Optimal transport (OT) has become a widely used tool in the machine learning field to measure the discrepancy between probability distributions.
We propose regularizing OT with the beta-potential term associated with the so-called $beta$-divergence.
We experimentally demonstrate that the transport matrix computed with our algorithm helps estimate a probability distribution robustly even in the presence of outliers.
arXiv Detail & Related papers (2022-12-26T18:37:28Z) - Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process.
We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator.
We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z) - CPM-2: Large-scale Cost-effective Pre-trained Language Models [71.59893315671997]
We present a suite of cost-effective techniques for the use of PLMs to deal with the efficiency issues of pre-training, fine-tuning, and inference.
We introduce knowledge inheritance to accelerate the pre-training process by exploiting existing PLMs instead of training models from scratch.
We implement a new inference toolkit, namely InfMoE, for using large-scale PLMs with limited computational resources.
arXiv Detail & Related papers (2021-06-20T15:43:54Z) - Direction Matters: On the Implicit Bias of Stochastic Gradient Descent
with Moderate Learning Rate [105.62979485062756]
This paper attempts to characterize the particular regularization effect of SGD in the moderate learning rate regime.
We show that SGD converges along the large eigenvalue directions of the data matrix, while GD goes after the small eigenvalue directions.
arXiv Detail & Related papers (2020-11-04T21:07:52Z) - Rethink Maximum Mean Discrepancy for Domain Adaptation [77.2560592127872]
This paper theoretically proves two essential facts: 1) minimizing the Maximum Mean Discrepancy equals to maximize the source and target intra-class distances respectively but jointly minimize their variance with some implicit weights, so that the feature discriminability degrades.
Experiments on several benchmark datasets not only prove the validity of theoretical results but also demonstrate that our approach could perform better than the comparative state-of-art methods substantially.
arXiv Detail & Related papers (2020-07-01T18:25:10Z) - Neural Methods for Point-wise Dependency Estimation [129.93860669802046]
We focus on estimating point-wise dependency (PD), which quantitatively measures how likely two outcomes co-occur.
We demonstrate the effectiveness of our approaches in 1) MI estimation, 2) self-supervised representation learning, and 3) cross-modal retrieval task.
arXiv Detail & Related papers (2020-06-09T23:26:15Z) - A Family of Pairwise Multi-Marginal Optimal Transports that Define a
Generalized Metric [2.650860836597657]
Multi-marginal OT (MMOT) generalizes OT to simultaneously transporting multiple distributions.
We prove new generalized metric properties for a family of pairwise MMOTs.
We illustrate the superiority of our MMOTs over other generalized metrics, and over non-metrics in both synthetic and real tasks.
arXiv Detail & Related papers (2020-01-29T22:10:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.