Supervised Training of Conditional Monge Maps
- URL: http://arxiv.org/abs/2206.14262v2
- Date: Fri, 31 Mar 2023 08:12:59 GMT
- Title: Supervised Training of Conditional Monge Maps
- Authors: Charlotte Bunne, Andreas Krause, Marco Cuturi
- Abstract summary: Optimal transport (OT) theory describes general principles to define and select, among many possible choices, the most efficient way to map a probability measure onto another.
We introduce CondOT, a multi-task approach to estimate a family of OT maps conditioned on a context variable.
We demonstrate the ability of CondOT to infer the effect of an arbitrary combination of genetic or therapeutic perturbations on single cells.
- Score: 107.78770597815242
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Optimal transport (OT) theory describes general principles to define and
select, among many possible choices, the most efficient way to map a
probability measure onto another. That theory has been mostly used to estimate,
given a pair of source and target probability measures $(\mu, \nu)$, a
parameterized map $T_\theta$ that can efficiently map $\mu$ onto $\nu$. In many
applications, such as predicting cell responses to treatments, pairs of
input/output data measures $(\mu, \nu)$ that define optimal transport problems
do not arise in isolation but are associated with a context $c$, as for
instance a treatment when comparing populations of untreated and treated cells.
To account for that context in OT estimation, we introduce CondOT, a multi-task
approach to estimate a family of OT maps conditioned on a context variable,
using several pairs of measures $\left(\mu_i, \nu_i\right)$ tagged with a
context label $c_i$. CondOT learns a global map $\mathcal{T}_\theta$
conditioned on context that is not only expected to fit all labeled pairs in
the dataset $\left\{\left(c_i,\left(\mu_i, \nu_i\right)\right)\right\}$, i.e.,
$\mathcal{T}_\theta\left(c_i\right) \sharp \mu_i \approx \nu_i$, but should
also generalize to produce meaningful maps $\mathcal{T}_\theta\left(c_{\text
{new }}\right)$ when conditioned on unseen contexts $c_{\text {new }}$. Our
approach harnesses and provides a novel usage for partially input convex neural
networks, for which we introduce a robust and efficient initialization strategy
inspired by Gaussian approximations. We demonstrate the ability of CondOT to
infer the effect of an arbitrary combination of genetic or therapeutic
perturbations on single cells, using only observations of the effects of said
perturbations separately.
Related papers
- The Monge Gap: A Regularizer to Learn All Transport Maps [34.81915836064636]
Brenier's theorem states that when the ground cost is the squared-Euclidean distance, the best'' map to morph a continuous measure in $mathcalP(Rd)$ into another must be the gradient of a convex function.
Despite their mathematical elegance, fitting OT maps with ICNNs raises many challenges.
We propose a radically different approach to estimating OT maps.
arXiv Detail & Related papers (2023-02-09T21:56:11Z) - Monge, Bregman and Occam: Interpretable Optimal Transport in
High-Dimensions with Feature-Sparse Maps [37.45959537338404]
We show that choosing a sparsity-inducing norm for $tau$ results in maps that apply Occam's razor to transport.
We showcase the ability of our method to estimate meaningful maps for high-dimensional single-cell transcription data.
arXiv Detail & Related papers (2023-02-08T14:02:34Z) - Dynamic Ranking and Translation Synchronization [3.946250592943285]
We study an extension of the emphtranslation synchronization problem, to the dynamic setting.
We propose two estimators -- one based on a smoothness-penalized least squares approach and the other based on projection onto the low frequency eigenspace of a suitable smoothness operator.
arXiv Detail & Related papers (2022-07-04T14:45:12Z) - Likelihood estimation of sparse topic distributions in topic models and
its applications to Wasserstein document distance calculations [3.679981089267181]
In topic models, the $ptimes n$ expected word frequency matrix is assumed to be factorized as a $ptimes K$ word-topic matrix $A$.
columns of $A$ are viewed as $p$-dimensional mixture components that are common to all documents.
When $A$ is unknown, we estimate $T$ by optimizing the likelihood function corresponding to a plug in, generic, estimator $hatA$ of $A$.
arXiv Detail & Related papers (2021-07-12T22:22:32Z) - Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization [51.23789922123412]
We study online learning with bandit feedback (i.e. learner has access to only zeroth-order oracle) where cost/reward functions admit a "pseudo-1d" structure.
We show a lower bound of $min(sqrtdT, T3/4)$ for the regret of any algorithm, where $T$ is the number of rounds.
We propose a new algorithm sbcalg that combines randomized online gradient descent with a kernelized exponential weights method to exploit the pseudo-1d structure effectively.
arXiv Detail & Related papers (2021-02-15T08:16:51Z) - List-Decodable Mean Estimation in Nearly-PCA Time [50.79691056481693]
We study the fundamental task of list-decodable mean estimation in high dimensions.
Our algorithm runs in time $widetildeO(ndk)$ for all $k = O(sqrtd) cup Omega(d)$, where $n$ is the size of the dataset.
A variant of our algorithm has runtime $widetildeO(ndk)$ for all $k$, at the expense of an $O(sqrtlog k)$ factor in the recovery guarantee
arXiv Detail & Related papers (2020-11-19T17:21:37Z) - Linear Time Sinkhorn Divergences using Positive Features [51.50788603386766]
Solving optimal transport with an entropic regularization requires computing a $ntimes n$ kernel matrix that is repeatedly applied to a vector.
We propose to use instead ground costs of the form $c(x,y)=-logdotpvarphi(x)varphi(y)$ where $varphi$ is a map from the ground space onto the positive orthant $RRr_+$, with $rll n$.
arXiv Detail & Related papers (2020-06-12T10:21:40Z) - Adaptive exponential power distribution with moving estimator for
nonstationary time series [0.8702432681310399]
We will focus on maximum likelihood (ML) adaptive estimation for nonstationary time series.
We focus on such example: $rho(x)propto exp(-|(x-mu)/sigma|kappa/kappa)$ exponential power distribution (EPD) family.
It is tested on daily log-return series for DJIA companies, leading to essentially better log-likelihoods than standard (static) estimation.
arXiv Detail & Related papers (2020-03-04T15:56:44Z) - Taking a hint: How to leverage loss predictors in contextual bandits? [63.546913998407405]
We study learning in contextual bandits with the help of loss predictors.
We show that the optimal regret is $mathcalO(minsqrtT, sqrtmathcalETfrac13)$ when $mathcalE$ is known.
arXiv Detail & Related papers (2020-03-04T07:36:38Z) - Agnostic Q-learning with Function Approximation in Deterministic
Systems: Tight Bounds on Approximation Error and Sample Complexity [94.37110094442136]
We study the problem of agnostic $Q$-learning with function approximation in deterministic systems.
We show that if $delta = Oleft(rho/sqrtdim_Eright)$, then one can find the optimal policy using $Oleft(dim_Eright)$.
arXiv Detail & Related papers (2020-02-17T18:41:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.