Related papers: The Monge Gap: A Regularizer to Learn All Transport Maps

The Monge Gap: A Regularizer to Learn All Transport Maps

URL: http://arxiv.org/abs/2302.04953v1
Date: Thu, 9 Feb 2023 21:56:11 GMT
Title: The Monge Gap: A Regularizer to Learn All Transport Maps
Authors: Th\'eo Uscidda, Marco Cuturi
Abstract summary: Brenier's theorem states that when the ground cost is the squared-Euclidean distance, the best'' map to morph a continuous measure in $mathcalP(Rd)$ into another must be the gradient of a convex function. Despite their mathematical elegance, fitting OT maps with ICNNs raises many challenges. We propose a radically different approach to estimating OT maps.
Score: 34.81915836064636
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Optimal transport (OT) theory has been been used in machine learning to study and characterize maps that can push-forward efficiently a probability measure onto another. Recent works have drawn inspiration from Brenier's theorem, which states that when the ground cost is the squared-Euclidean distance, the ``best'' map to morph a continuous measure in $\mathcal{P}(\Rd)$ into another must be the gradient of a convex function. To exploit that result, [Makkuva+ 2020, Korotin+2020] consider maps $T=\nabla f_\theta$, where $f_\theta$ is an input convex neural network (ICNN), as defined by Amos+2017, and fit $\theta$ with SGD using samples. Despite their mathematical elegance, fitting OT maps with ICNNs raises many challenges, due notably to the many constraints imposed on $\theta$; the need to approximate the conjugate of $f_\theta$; or the limitation that they only work for the squared-Euclidean cost. More generally, we question the relevance of using Brenier's result, which only applies to densities, to constrain the architecture of candidate maps fitted on samples. Motivated by these limitations, we propose a radically different approach to estimating OT maps: Given a cost $c$ and a reference measure $\rho$, we introduce a regularizer, the Monge gap $\mathcal{M}^c_{\rho}(T)$ of a map $T$. That gap quantifies how far a map $T$ deviates from the ideal properties we expect from a $c$-OT map. In practice, we drop all architecture requirements for $T$ and simply minimize a distance (e.g., the Sinkhorn divergence) between $T\sharp\mu$ and $\nu$, regularized by $\mathcal{M}^c_\rho(T)$. We study $\mathcal{M}^c_{\rho}$, and show how our simple pipeline outperforms significantly other baselines in practice.

Related papers

Learning and Computation of $Φ$-Equilibria at the Frontier of Tractability [85.07238533644636]
$Phi$-equilibria is a powerful and flexible framework at the heart of online learning and game theory. We show that an efficient online algorithm incurs average $Phi$-regret at most $epsilon$ using $textpoly(d, k)/epsilon2$ rounds. We also show nearly matching lower bounds in the online setting, thereby obtaining for the first time a family of deviations that captures the learnability of $Phi$-regret.
arXiv Detail & Related papers (2025-02-25T19:08:26Z)
Displacement-Sparse Neural Optimal Transport [6.968698312185846]
Optimal Transport (OT) theory seeks to determine the map $T:X to Y$ that transports a source measure $P to a target measure $Q$. We introduce a sparsity penalty into the minimax Wasserstein formulation, promote sparsity in displacement vectors $Delta(mathbfx) := T(mathbfx) := T(mathbfx) := T(mathbfx) := T(mathbfx) := T(mathbf
arXiv Detail & Related papers (2025-02-03T23:44:17Z)
Metric Embeddings Beyond Bi-Lipschitz Distortion via Sherali-Adams [34.7582575446942]
We give the first approximation algorithm for MDS with quasi-polynomial dependency on $Delta.<n>Our algorithms are based on a novel geometry-aware analysis of a conditional rounding of the Sherali-Adams LP.
arXiv Detail & Related papers (2023-11-29T17:42:05Z)
A Near-Linear Time Algorithm for the Chamfer Distance [21.018781726524946]
Chamfer distance is a popular measure of dissimilarity between point clouds. We present the first $(1+epsilon)$-approximate algorithm for estimating the Chamfer distance with a near-linear running time. Our experiments demonstrate that it is both accurate and fast on large high-dimensional datasets.
arXiv Detail & Related papers (2023-07-06T15:07:48Z)
Learning Elastic Costs to Shape Monge Displacements [39.381326738705255]
Monge problem asks to find the most efficient way to map one distribution to the other. elastic costs shape the textitdisplacements of Monge maps $T$. We propose a numerical method to compute Monge maps that are provably optimal.
arXiv Detail & Related papers (2023-06-20T21:17:32Z)
Monge, Bregman and Occam: Interpretable Optimal Transport in High-Dimensions with Feature-Sparse Maps [37.45959537338404]
We show that choosing a sparsity-inducing norm for $tau$ results in maps that apply Occam's razor to transport. We showcase the ability of our method to estimate meaningful maps for high-dimensional single-cell transcription data.
arXiv Detail & Related papers (2023-02-08T14:02:34Z)
Supervised Training of Conditional Monge Maps [107.78770597815242]
Optimal transport (OT) theory describes general principles to define and select, among many possible choices, the most efficient way to map a probability measure onto another. We introduce CondOT, a multi-task approach to estimate a family of OT maps conditioned on a context variable. We demonstrate the ability of CondOT to infer the effect of an arbitrary combination of genetic or therapeutic perturbations on single cells.
arXiv Detail & Related papers (2022-06-28T19:34:44Z)
Fast Graph Sampling for Short Video Summarization using Gershgorin Disc Alignment [52.577757919003844]
We study the problem of efficiently summarizing a short video into several paragraphs, leveraging recent progress in fast graph sampling. Experimental results show that our algorithm achieves comparable video summarization as state-of-the-art methods, at a substantially reduced complexity.
arXiv Detail & Related papers (2021-10-21T18:43:00Z)
On the Power of Multitask Representation Learning in Linear MDP [61.58929164172968]
This paper presents analyses for the statistical benefit of multitask representation learning in linear Markov Decision Process (MDP) We first discover a emphLeast-Activated-Feature-Abundance (LAFA) criterion, denoted as $kappa$, with which we prove that a straightforward least-square algorithm learns a policy which is $tildeO(H2sqrtfrackappa mathcalC(Phi)2 kappa dNT+frackappa dn)
arXiv Detail & Related papers (2021-06-15T11:21:06Z)
Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization [51.23789922123412]
We study online learning with bandit feedback (i.e. learner has access to only zeroth-order oracle) where cost/reward functions admit a "pseudo-1d" structure. We show a lower bound of $min(sqrtdT, T3/4)$ for the regret of any algorithm, where $T$ is the number of rounds. We propose a new algorithm sbcalg that combines randomized online gradient descent with a kernelized exponential weights method to exploit the pseudo-1d structure effectively.
arXiv Detail & Related papers (2021-02-15T08:16:51Z)
Hardness of Learning Halfspaces with Massart Noise [56.98280399449707]
We study the complexity of PAC learning halfspaces in the presence of Massart (bounded) noise. We show that there an exponential gap between the information-theoretically optimal error and the best error that can be achieved by a SQ algorithm.
arXiv Detail & Related papers (2020-12-17T16:43:11Z)
On Efficient Low Distortion Ultrametric Embedding [18.227854382422112]
A widely-used method to preserve the underlying hierarchical structure of the data is to find an embedding of the data into a tree or an ultrametric. In this paper, we provide a new algorithm which takes as input a set of points isometric in $mathbbRd2 (for universal constant $rho>1$) to output an ultrametric $Delta. We show that the output of our algorithm is comparable to the output of the linkage algorithms while achieving a much faster running time.
arXiv Detail & Related papers (2020-08-15T11:06:45Z)
Taking a hint: How to leverage loss predictors in contextual bandits? [63.546913998407405]
We study learning in contextual bandits with the help of loss predictors. We show that the optimal regret is $mathcalO(minsqrtT, sqrtmathcalETfrac13)$ when $mathcalE$ is known.
arXiv Detail & Related papers (2020-03-04T07:36:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.