The Monge Gap: A Regularizer to Learn All Transport Maps
- URL: http://arxiv.org/abs/2302.04953v1
- Date: Thu, 9 Feb 2023 21:56:11 GMT
- Title: The Monge Gap: A Regularizer to Learn All Transport Maps
- Authors: Th\'eo Uscidda, Marco Cuturi
- Abstract summary: Brenier's theorem states that when the ground cost is the squared-Euclidean distance, the best'' map to morph a continuous measure in $mathcalP(Rd)$ into another must be the gradient of a convex function.
Despite their mathematical elegance, fitting OT maps with ICNNs raises many challenges.
We propose a radically different approach to estimating OT maps.
- Score: 34.81915836064636
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Optimal transport (OT) theory has been been used in machine learning to study
and characterize maps that can push-forward efficiently a probability measure
onto another. Recent works have drawn inspiration from Brenier's theorem, which
states that when the ground cost is the squared-Euclidean distance, the
``best'' map to morph a continuous measure in $\mathcal{P}(\Rd)$ into another
must be the gradient of a convex function. To exploit that result, [Makkuva+
2020, Korotin+2020] consider maps $T=\nabla f_\theta$, where $f_\theta$ is an
input convex neural network (ICNN), as defined by Amos+2017, and fit $\theta$
with SGD using samples. Despite their mathematical elegance, fitting OT maps
with ICNNs raises many challenges, due notably to the many constraints imposed
on $\theta$; the need to approximate the conjugate of $f_\theta$; or the
limitation that they only work for the squared-Euclidean cost. More generally,
we question the relevance of using Brenier's result, which only applies to
densities, to constrain the architecture of candidate maps fitted on samples.
Motivated by these limitations, we propose a radically different approach to
estimating OT maps: Given a cost $c$ and a reference measure $\rho$, we
introduce a regularizer, the Monge gap $\mathcal{M}^c_{\rho}(T)$ of a map $T$.
That gap quantifies how far a map $T$ deviates from the ideal properties we
expect from a $c$-OT map. In practice, we drop all architecture requirements
for $T$ and simply minimize a distance (e.g., the Sinkhorn divergence) between
$T\sharp\mu$ and $\nu$, regularized by $\mathcal{M}^c_\rho(T)$. We study
$\mathcal{M}^c_{\rho}$, and show how our simple pipeline outperforms
significantly other baselines in practice.
Related papers
- A Near-Linear Time Algorithm for the Chamfer Distance [21.018781726524946]
Chamfer distance is a popular measure of dissimilarity between point clouds.
We present the first $(1+epsilon)$-approximate algorithm for estimating the Chamfer distance with a near-linear running time.
Our experiments demonstrate that it is both accurate and fast on large high-dimensional datasets.
arXiv Detail & Related papers (2023-07-06T15:07:48Z) - Learning Elastic Costs to Shape Monge Displacements [39.381326738705255]
Monge problem asks to find the most efficient way to map one distribution to the other.
elastic costs shape the textitdisplacements of Monge maps $T$.
We propose a numerical method to compute Monge maps that are provably optimal.
arXiv Detail & Related papers (2023-06-20T21:17:32Z) - Monge, Bregman and Occam: Interpretable Optimal Transport in
High-Dimensions with Feature-Sparse Maps [37.45959537338404]
We show that choosing a sparsity-inducing norm for $tau$ results in maps that apply Occam's razor to transport.
We showcase the ability of our method to estimate meaningful maps for high-dimensional single-cell transcription data.
arXiv Detail & Related papers (2023-02-08T14:02:34Z) - Supervised Training of Conditional Monge Maps [107.78770597815242]
Optimal transport (OT) theory describes general principles to define and select, among many possible choices, the most efficient way to map a probability measure onto another.
We introduce CondOT, a multi-task approach to estimate a family of OT maps conditioned on a context variable.
We demonstrate the ability of CondOT to infer the effect of an arbitrary combination of genetic or therapeutic perturbations on single cells.
arXiv Detail & Related papers (2022-06-28T19:34:44Z) - Fast Graph Sampling for Short Video Summarization using Gershgorin Disc
Alignment [52.577757919003844]
We study the problem of efficiently summarizing a short video into several paragraphs, leveraging recent progress in fast graph sampling.
Experimental results show that our algorithm achieves comparable video summarization as state-of-the-art methods, at a substantially reduced complexity.
arXiv Detail & Related papers (2021-10-21T18:43:00Z) - On the Power of Multitask Representation Learning in Linear MDP [61.58929164172968]
This paper presents analyses for the statistical benefit of multitask representation learning in linear Markov Decision Process (MDP)
We first discover a emphLeast-Activated-Feature-Abundance (LAFA) criterion, denoted as $kappa$, with which we prove that a straightforward least-square algorithm learns a policy which is $tildeO(H2sqrtfrackappa mathcalC(Phi)2 kappa dNT+frackappa dn)
arXiv Detail & Related papers (2021-06-15T11:21:06Z) - Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization [51.23789922123412]
We study online learning with bandit feedback (i.e. learner has access to only zeroth-order oracle) where cost/reward functions admit a "pseudo-1d" structure.
We show a lower bound of $min(sqrtdT, T3/4)$ for the regret of any algorithm, where $T$ is the number of rounds.
We propose a new algorithm sbcalg that combines randomized online gradient descent with a kernelized exponential weights method to exploit the pseudo-1d structure effectively.
arXiv Detail & Related papers (2021-02-15T08:16:51Z) - Hardness of Learning Halfspaces with Massart Noise [56.98280399449707]
We study the complexity of PAC learning halfspaces in the presence of Massart (bounded) noise.
We show that there an exponential gap between the information-theoretically optimal error and the best error that can be achieved by a SQ algorithm.
arXiv Detail & Related papers (2020-12-17T16:43:11Z) - On Efficient Low Distortion Ultrametric Embedding [18.227854382422112]
A widely-used method to preserve the underlying hierarchical structure of the data is to find an embedding of the data into a tree or an ultrametric.
In this paper, we provide a new algorithm which takes as input a set of points isometric in $mathbbRd2 (for universal constant $rho>1$) to output an ultrametric $Delta.
We show that the output of our algorithm is comparable to the output of the linkage algorithms while achieving a much faster running time.
arXiv Detail & Related papers (2020-08-15T11:06:45Z) - Taking a hint: How to leverage loss predictors in contextual bandits? [63.546913998407405]
We study learning in contextual bandits with the help of loss predictors.
We show that the optimal regret is $mathcalO(minsqrtT, sqrtmathcalETfrac13)$ when $mathcalE$ is known.
arXiv Detail & Related papers (2020-03-04T07:36:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.