Monge, Bregman and Occam: Interpretable Optimal Transport in
High-Dimensions with Feature-Sparse Maps
- URL: http://arxiv.org/abs/2302.04065v1
- Date: Wed, 8 Feb 2023 14:02:34 GMT
- Title: Monge, Bregman and Occam: Interpretable Optimal Transport in
High-Dimensions with Feature-Sparse Maps
- Authors: Marco Cuturi, Michal Klein, Pierre Ablin
- Abstract summary: We show that choosing a sparsity-inducing norm for $tau$ results in maps that apply Occam's razor to transport.
We showcase the ability of our method to estimate meaningful maps for high-dimensional single-cell transcription data.
- Score: 37.45959537338404
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Optimal transport (OT) theory focuses, among all maps
$T:\mathbb{R}^d\rightarrow \mathbb{R}^d$ that can morph a probability measure
onto another, on those that are the ``thriftiest'', i.e. such that the averaged
cost $c(x, T(x))$ between $x$ and its image $T(x)$ be as small as possible.
Many computational approaches have been proposed to estimate such Monge maps
when $c$ is the $\ell_2^2$ distance, e.g., using entropic maps [Pooladian'22],
or neural networks [Makkuva'20, Korotin'20]. We propose a new model for
transport maps, built on a family of translation invariant costs $c(x,
y):=h(x-y)$, where $h:=\tfrac{1}{2}\|\cdot\|_2^2+\tau$ and $\tau$ is a
regularizer. We propose a generalization of the entropic map suitable for $h$,
and highlight a surprising link tying it with the Bregman centroids of the
divergence $D_h$ generated by $h$, and the proximal operator of $\tau$. We show
that choosing a sparsity-inducing norm for $\tau$ results in maps that apply
Occam's razor to transport, in the sense that the displacement vectors
$\Delta(x):= T(x)-x$ they induce are sparse, with a sparsity pattern that
varies depending on $x$. We showcase the ability of our method to estimate
meaningful OT maps for high-dimensional single-cell transcription data, in the
$34000$-$d$ space of gene counts for cells, without using dimensionality
reduction, thus retaining the ability to interpret all displacements at the
gene level.
Related papers
- Displacement-Sparse Neural Optimal Transport [6.968698312185846]
Optimal Transport (OT) theory seeks to determine the map $T:X to Y$ that transports a source measure $P to a target measure $Q$.
We introduce a sparsity penalty into the minimax Wasserstein formulation, promote sparsity in displacement vectors $Delta(mathbfx) := T(mathbfx) := T(mathbfx) := T(mathbfx) := T(mathbfx) := T(mathbf
arXiv Detail & Related papers (2025-02-03T23:44:17Z) - Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms [50.15964512954274]
We study the problem of residual error estimation for matrix and vector norms using a linear sketch.
We demonstrate that this gives a substantial advantage empirically, for roughly the same sketch size and accuracy as in previous work.
We also show an $Omega(k2/pn1-2/p)$ lower bound for the sparse recovery problem, which is tight up to a $mathrmpoly(log n)$ factor.
arXiv Detail & Related papers (2024-08-16T02:33:07Z) - Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit [75.4661041626338]
We study the problem of gradient descent learning of a single-index target function $f_*(boldsymbolx) = textstylesigma_*left(langleboldsymbolx,boldsymbolthetarangleright)$
We prove that a two-layer neural network optimized by an SGD-based algorithm learns $f_*$ with a complexity that is not governed by information exponents.
arXiv Detail & Related papers (2024-06-03T17:56:58Z) - $O(k)$-Equivariant Dimensionality Reduction on Stiefel Manifolds [2.0818404738530525]
Many real-world datasets live on high-dimensional Stiefel and Grassmannian manifold, $V_k(mathbbRN)$ and $Gr(k, mathbbRN)$ respectively.
We propose an algorithm called textitPrincipal Stiefel Coordinates (PSC) to reduce data dimensionality from $ V_k(mathbbRN)$ to $V_k(mathbbRn)$ in an textit$O(k)$-equivariant manner
arXiv Detail & Related papers (2023-09-19T17:21:12Z) - Learning Elastic Costs to Shape Monge Displacements [39.381326738705255]
Monge problem asks to find the most efficient way to map one distribution to the other.
elastic costs shape the textitdisplacements of Monge maps $T$.
We propose a numerical method to compute Monge maps that are provably optimal.
arXiv Detail & Related papers (2023-06-20T21:17:32Z) - The Monge Gap: A Regularizer to Learn All Transport Maps [34.81915836064636]
Brenier's theorem states that when the ground cost is the squared-Euclidean distance, the best'' map to morph a continuous measure in $mathcalP(Rd)$ into another must be the gradient of a convex function.
Despite their mathematical elegance, fitting OT maps with ICNNs raises many challenges.
We propose a radically different approach to estimating OT maps.
arXiv Detail & Related papers (2023-02-09T21:56:11Z) - Supervised Training of Conditional Monge Maps [107.78770597815242]
Optimal transport (OT) theory describes general principles to define and select, among many possible choices, the most efficient way to map a probability measure onto another.
We introduce CondOT, a multi-task approach to estimate a family of OT maps conditioned on a context variable.
We demonstrate the ability of CondOT to infer the effect of an arbitrary combination of genetic or therapeutic perturbations on single cells.
arXiv Detail & Related papers (2022-06-28T19:34:44Z) - Sparse sketches with small inversion bias [79.77110958547695]
Inversion bias arises when averaging estimates of quantities that depend on the inverse covariance.
We develop a framework for analyzing inversion bias, based on our proposed concept of an $(epsilon,delta)$-unbiased estimator for random matrices.
We show that when the sketching matrix $S$ is dense and has i.i.d. sub-gaussian entries, the estimator $(epsilon,delta)$-unbiased for $(Atop A)-1$ with a sketch of size $m=O(d+sqrt d/
arXiv Detail & Related papers (2020-11-21T01:33:15Z) - Linear Time Sinkhorn Divergences using Positive Features [51.50788603386766]
Solving optimal transport with an entropic regularization requires computing a $ntimes n$ kernel matrix that is repeatedly applied to a vector.
We propose to use instead ground costs of the form $c(x,y)=-logdotpvarphi(x)varphi(y)$ where $varphi$ is a map from the ground space onto the positive orthant $RRr_+$, with $rll n$.
arXiv Detail & Related papers (2020-06-12T10:21:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.