Related papers: Learning Elastic Costs to Shape Monge Displacements

Learning Elastic Costs to Shape Monge Displacements

URL: http://arxiv.org/abs/2306.11895v2
Date: Thu, 23 May 2024 17:00:05 GMT
Title: Learning Elastic Costs to Shape Monge Displacements
Authors: Michal Klein, Aram-Alexandre Pooladian, Pierre Ablin, Eugène Ndiaye, Jonathan Niles-Weed, Marco Cuturi,
Abstract summary: Monge problem asks to find the most efficient way to map one distribution to the other. elastic costs shape the textitdisplacements of Monge maps $T$. We propose a numerical method to compute Monge maps that are provably optimal.
Score: 39.381326738705255
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Given a source and a target probability measure supported on $\mathbb{R}^d$, the Monge problem asks to find the most efficient way to map one distribution to the other. This efficiency is quantified by defining a \textit{cost} function between source and target data. Such a cost is often set by default in the machine learning literature to the squared-Euclidean distance, $\ell^2_2(\mathbf{x},\mathbf{y})=\tfrac12|\mathbf{x}-\mathbf{y}|_2^2$. Recently, Cuturi et. al '23 highlighted the benefits of using elastic costs, defined through a regularizer $\tau$ as $c(\mathbf{x},\mathbf{y})=\ell^2_2(\mathbf{x},\mathbf{y})+\tau(\mathbf{x}-\mathbf{y})$. Such costs shape the \textit{displacements} of Monge maps $T$, i.e., the difference between a source point and its image $T(\mathbf{x})-\mathbf{x})$, by giving them a structure that matches that of the proximal operator of $\tau$. In this work, we make two important contributions to the study of elastic costs: (i) For any elastic cost, we propose a numerical method to compute Monge maps that are provably optimal. This provides a much-needed routine to create synthetic problems where the ground truth OT map is known, by analogy to the Brenier theorem, which states that the gradient of any convex potential is always a valid Monge map for the $\ell_2^2$ cost; (ii) We propose a loss to \textit{learn} the parameter $\theta$ of a parameterized regularizer $\tau_\theta$, and apply it in the case where $\tau_{A}(\mathbf{z})=|A^\perp \mathbf{z}|^2_2$. This regularizer promotes displacements that lie on a low dimensional subspace of $\mathbb{R}^d$, spanned by the $p$ rows of $A\in\mathbb{R}^{p\times d}$.

Related papers

On the Capacity Region of Individual Key Rates in Vector Linear Secure Aggregation [55.126702858312456]
We show that it is not necessary for every user to hold a key, thereby strictly enlarging the best-known achievable region in the literature.<n>Our results uncover the novel fact that it is not necessary for every user to hold a key, thereby strictly enlarging the best-known achievable region in the literature.
arXiv Detail & Related papers (2026-01-06T18:34:07Z)
Information-Computation Tradeoffs for Noiseless Linear Regression with Oblivious Contamination [65.37519531362157]
We show that any efficient Statistical Query algorithm for this task requires VSTAT complexity at least $tildeOmega(d1/2/alpha2)$.
arXiv Detail & Related papers (2025-10-12T15:42:44Z)
Algorithmic contiguity from low-degree conjecture and applications in correlated random graphs [0.0]
We provide evidence of computational hardness for two problems. One of the main ingredient in our proof is to derive certain forms of emphalgorithm contiguity between two probability measures. This framework provides a useful tool for performing reductions between different tasks.
arXiv Detail & Related papers (2025-02-14T00:24:51Z)
Learning a Single Neuron Robustly to Distributional Shifts and Adversarial Label Noise [38.551072383777594]
We study the problem of learning a single neuron with respect to the $L2$ loss in the presence of adversarial distribution shifts. A new algorithm is developed to approximate the vector vector squared loss with respect to the worst distribution that is in the $chi2$divergence to the $mathcalp_0$.
arXiv Detail & Related papers (2024-11-11T03:43:52Z)
Monge-Kantorovich Fitting With Sobolev Budgets [6.748324975906262]
We show that when $rho$ is concentrated near an $mtext-d$ set we may interpret this as a manifold learning problem with noisy data. We quantify $nu$'s performance in approximating $rho$ via the Monge-Kantorovich $p$-cost $mathbbW_pp(rho, nu)$, and constrain the complexity by requiring $mathrmsupp nu$ to be coverable by an $f : mathbbRm
arXiv Detail & Related papers (2024-09-25T01:30:16Z)
Locality Regularized Reconstruction: Structured Sparsity and Delaunay Triangulations [7.148312060227714]
Linear representation learning is widely studied due to its conceptual simplicity and empirical utility in tasks such as compression, classification, and feature extraction. In this work we seek $mathbfw$ that forms a local reconstruction of $mathbfy$ by solving a regularized least squares regression problem. We prove that, for all levels of regularization and under a mild condition that the columns of $mathbfX$ have a unique Delaunay triangulation, the optimal coefficients' number of non-zero entries is upper bounded by $d+1$.
arXiv Detail & Related papers (2024-05-01T19:56:52Z)
Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models. In this work, we initiate the study of provably learning a multi-head attention layer from random examples. We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z)
Families of costs with zero and nonnegative MTW tensor in optimal transport [0.0]
We compute explicitly the MTW tensor for the optimal transport problem on $mathbbRn$ with a cost function of form $mathsfc$. We analyze the $sinh$-type hyperbolic cost, providing examples of $mathsfc$-type functions and divergence.
arXiv Detail & Related papers (2024-01-01T20:33:27Z)
Optimal Estimator for Linear Regression with Shuffled Labels [17.99906229036223]
This paper considers the task of linear regression with shuffled labels. $mathbf Y in mathbb Rntimes m, mathbf Pi in mathbb Rntimes p, mathbf B in mathbb Rptimes m$, and $mathbf Win mathbb Rntimes m$, respectively.
arXiv Detail & Related papers (2023-10-02T16:44:47Z)
A Unified Framework for Uniform Signal Recovery in Nonlinear Generative Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously. Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples. We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z)
The case for and against fixed step-size: Stochastic approximation algorithms in optimization and machine learning [6.416429054645991]
Theory and application of approximation (SA) have become increasingly relevant due in part to applications in optimization and reinforcement learning.<n>This paper takes a new look at SA with constant step-size $alpha>0$, defined by the recursion, $$theta_n+1 = theta_n+ alpha f(theta_n,Phi_n+1)$$ in which $theta_ninmathbbRd$ and $Phi_n$ is a Markov chain.
arXiv Detail & Related papers (2023-09-06T12:22:32Z)
Learning a Single Neuron with Adversarial Label Noise via Gradient Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations. The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z)
Threshold Phenomena in Learning Halfspaces with Massart Noise [56.01192577666607]
We study the problem of PAC learning halfspaces on $mathbbRd$ with Massart noise under Gaussian marginals. Our results qualitatively characterize the complexity of learning halfspaces in the Massart model.
arXiv Detail & Related papers (2021-08-19T16:16:48Z)
Near-Optimal SQ Lower Bounds for Agnostically Learning Halfspaces and ReLUs under Gaussian Marginals [49.60752558064027]
We study the fundamental problems of agnostically learning halfspaces and ReLUs under Gaussian marginals. Our lower bounds provide strong evidence that current upper bounds for these tasks are essentially best possible.
arXiv Detail & Related papers (2020-06-29T17:10:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.