Learning Elastic Costs to Shape Monge Displacements
- URL: http://arxiv.org/abs/2306.11895v2
- Date: Thu, 23 May 2024 17:00:05 GMT
- Title: Learning Elastic Costs to Shape Monge Displacements
- Authors: Michal Klein, Aram-Alexandre Pooladian, Pierre Ablin, Eugène Ndiaye, Jonathan Niles-Weed, Marco Cuturi,
- Abstract summary: Monge problem asks to find the most efficient way to map one distribution to the other.
elastic costs shape the textitdisplacements of Monge maps $T$.
We propose a numerical method to compute Monge maps that are provably optimal.
- Score: 39.381326738705255
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Given a source and a target probability measure supported on $\mathbb{R}^d$, the Monge problem asks to find the most efficient way to map one distribution to the other. This efficiency is quantified by defining a \textit{cost} function between source and target data. Such a cost is often set by default in the machine learning literature to the squared-Euclidean distance, $\ell^2_2(\mathbf{x},\mathbf{y})=\tfrac12|\mathbf{x}-\mathbf{y}|_2^2$. Recently, Cuturi et. al '23 highlighted the benefits of using elastic costs, defined through a regularizer $\tau$ as $c(\mathbf{x},\mathbf{y})=\ell^2_2(\mathbf{x},\mathbf{y})+\tau(\mathbf{x}-\mathbf{y})$. Such costs shape the \textit{displacements} of Monge maps $T$, i.e., the difference between a source point and its image $T(\mathbf{x})-\mathbf{x})$, by giving them a structure that matches that of the proximal operator of $\tau$. In this work, we make two important contributions to the study of elastic costs: (i) For any elastic cost, we propose a numerical method to compute Monge maps that are provably optimal. This provides a much-needed routine to create synthetic problems where the ground truth OT map is known, by analogy to the Brenier theorem, which states that the gradient of any convex potential is always a valid Monge map for the $\ell_2^2$ cost; (ii) We propose a loss to \textit{learn} the parameter $\theta$ of a parameterized regularizer $\tau_\theta$, and apply it in the case where $\tau_{A}(\mathbf{z})=|A^\perp \mathbf{z}|^2_2$. This regularizer promotes displacements that lie on a low dimensional subspace of $\mathbb{R}^d$, spanned by the $p$ rows of $A\in\mathbb{R}^{p\times d}$.
Related papers
- Learning a Single Neuron Robustly to Distributional Shifts and Adversarial Label Noise [38.551072383777594]
We study the problem of learning a single neuron with respect to the $L2$ loss in the presence of adversarial distribution shifts.
A new algorithm is developed to approximate the vector vector squared loss with respect to the worst distribution that is in the $chi2$divergence to the $mathcalp_0$.
arXiv Detail & Related papers (2024-11-11T03:43:52Z) - Locality Regularized Reconstruction: Structured Sparsity and Delaunay Triangulations [7.148312060227714]
Linear representation learning is widely studied due to its conceptual simplicity and empirical utility in tasks such as compression, classification, and feature extraction.
In this work we seek $mathbfw$ that forms a local reconstruction of $mathbfy$ by solving a regularized least squares regression problem.
We prove that, for all levels of regularization and under a mild condition that the columns of $mathbfX$ have a unique Delaunay triangulation, the optimal coefficients' number of non-zero entries is upper bounded by $d+1$.
arXiv Detail & Related papers (2024-05-01T19:56:52Z) - Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models.
In this work, we initiate the study of provably learning a multi-head attention layer from random examples.
We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z) - Families of costs with zero and nonnegative MTW tensor in optimal
transport [0.0]
We compute explicitly the MTW tensor for the optimal transport problem on $mathbbRn$ with a cost function of form $mathsfc$.
We analyze the $sinh$-type hyperbolic cost, providing examples of $mathsfc$-type functions and divergence.
arXiv Detail & Related papers (2024-01-01T20:33:27Z) - Optimal Estimator for Linear Regression with Shuffled Labels [17.99906229036223]
This paper considers the task of linear regression with shuffled labels.
$mathbf Y in mathbb Rntimes m, mathbf Pi in mathbb Rntimes p, mathbf B in mathbb Rptimes m$, and $mathbf Win mathbb Rntimes m$, respectively.
arXiv Detail & Related papers (2023-10-02T16:44:47Z) - A Unified Framework for Uniform Signal Recovery in Nonlinear Generative
Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously.
Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples.
We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Threshold Phenomena in Learning Halfspaces with Massart Noise [56.01192577666607]
We study the problem of PAC learning halfspaces on $mathbbRd$ with Massart noise under Gaussian marginals.
Our results qualitatively characterize the complexity of learning halfspaces in the Massart model.
arXiv Detail & Related papers (2021-08-19T16:16:48Z) - Near-Optimal SQ Lower Bounds for Agnostically Learning Halfspaces and
ReLUs under Gaussian Marginals [49.60752558064027]
We study the fundamental problems of agnostically learning halfspaces and ReLUs under Gaussian marginals.
Our lower bounds provide strong evidence that current upper bounds for these tasks are essentially best possible.
arXiv Detail & Related papers (2020-06-29T17:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.