Related papers: Supervised Training of Conditional Monge Maps

Supervised Training of Conditional Monge Maps

URL: http://arxiv.org/abs/2206.14262v2
Date: Fri, 31 Mar 2023 08:12:59 GMT
Title: Supervised Training of Conditional Monge Maps
Authors: Charlotte Bunne, Andreas Krause, Marco Cuturi
Abstract summary: Optimal transport (OT) theory describes general principles to define and select, among many possible choices, the most efficient way to map a probability measure onto another. We introduce CondOT, a multi-task approach to estimate a family of OT maps conditioned on a context variable. We demonstrate the ability of CondOT to infer the effect of an arbitrary combination of genetic or therapeutic perturbations on single cells.
Score: 107.78770597815242
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Optimal transport (OT) theory describes general principles to define and select, among many possible choices, the most efficient way to map a probability measure onto another. That theory has been mostly used to estimate, given a pair of source and target probability measures $(\mu, \nu)$, a parameterized map $T_\theta$ that can efficiently map $\mu$ onto $\nu$. In many applications, such as predicting cell responses to treatments, pairs of input/output data measures $(\mu, \nu)$ that define optimal transport problems do not arise in isolation but are associated with a context $c$, as for instance a treatment when comparing populations of untreated and treated cells. To account for that context in OT estimation, we introduce CondOT, a multi-task approach to estimate a family of OT maps conditioned on a context variable, using several pairs of measures $\left(\mu_i, \nu_i\right)$ tagged with a context label $c_i$. CondOT learns a global map $\mathcal{T}_\theta$ conditioned on context that is not only expected to fit all labeled pairs in the dataset $\left\{\left(c_i,\left(\mu_i, \nu_i\right)\right)\right\}$, i.e., $\mathcal{T}_\theta\left(c_i\right) \sharp \mu_i \approx \nu_i$, but should also generalize to produce meaningful maps $\mathcal{T}_\theta\left(c_{\text {new }}\right)$ when conditioned on unseen contexts $c_{\text {new }}$. Our approach harnesses and provides a novel usage for partially input convex neural networks, for which we introduce a robust and efficient initialization strategy inspired by Gaussian approximations. We demonstrate the ability of CondOT to infer the effect of an arbitrary combination of genetic or therapeutic perturbations on single cells, using only observations of the effects of said perturbations separately.

Related papers

Eliciting Fine-Tuned Transformer Capabilities via Inference-Time Techniques [1.14219428942199]
Large language models have transformed natural language processing, yet supervised fine-tuning (SFT) remains computationally intensive.<n>This paper formally proves that capabilities acquired through SFT can be approximated by a base transformer model.<n>We extend these results to practical scenarios with finite context lengths and partial dataset access.
arXiv Detail & Related papers (2025-06-09T08:37:19Z)
On the $O(\rac{\sqrt{d}}{T^{1/4}})$ Convergence Rate of RMSProp and Its Momentum Extension Measured by $\ell_1$ Norm [54.28350823319057]
This paper considers the RMSProp and its momentum extension and establishes the convergence rate of $frac1Tsum_k=1T.<n>Our convergence rate matches the lower bound with respect to all the coefficients except the dimension $d$.<n>Our convergence rate can be considered to be analogous to the $frac1Tsum_k=1T.
arXiv Detail & Related papers (2024-02-01T07:21:32Z)
The Monge Gap: A Regularizer to Learn All Transport Maps [34.81915836064636]
Brenier's theorem states that when the ground cost is the squared-Euclidean distance, the best'' map to morph a continuous measure in $mathcalP(Rd)$ into another must be the gradient of a convex function. Despite their mathematical elegance, fitting OT maps with ICNNs raises many challenges. We propose a radically different approach to estimating OT maps.
arXiv Detail & Related papers (2023-02-09T21:56:11Z)
Monge, Bregman and Occam: Interpretable Optimal Transport in High-Dimensions with Feature-Sparse Maps [37.45959537338404]
We show that choosing a sparsity-inducing norm for $tau$ results in maps that apply Occam's razor to transport. We showcase the ability of our method to estimate meaningful maps for high-dimensional single-cell transcription data.
arXiv Detail & Related papers (2023-02-08T14:02:34Z)
Dynamic Ranking and Translation Synchronization [3.946250592943285]
We study an extension of the emphtranslation synchronization problem, to the dynamic setting. We propose two estimators -- one based on a smoothness-penalized least squares approach and the other based on projection onto the low frequency eigenspace of a suitable smoothness operator.
arXiv Detail & Related papers (2022-07-04T14:45:12Z)
Likelihood estimation of sparse topic distributions in topic models and its applications to Wasserstein document distance calculations [3.679981089267181]
In topic models, the $ptimes n$ expected word frequency matrix is assumed to be factorized as a $ptimes K$ word-topic matrix $A$. columns of $A$ are viewed as $p$-dimensional mixture components that are common to all documents. When $A$ is unknown, we estimate $T$ by optimizing the likelihood function corresponding to a plug in, generic, estimator $hatA$ of $A$.
arXiv Detail & Related papers (2021-07-12T22:22:32Z)
Nearly Horizon-Free Offline Reinforcement Learning [97.36751930393245]
We revisit offline reinforcement learning on episodic time-homogeneous Markov Decision Processes with $S$ states, $A$ actions and planning horizon $H$. We obtain the first set of nearly $H$-free sample complexity bounds for evaluation and planning using the empirical MDPs.
arXiv Detail & Related papers (2021-03-25T18:52:17Z)
Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization [51.23789922123412]
We study online learning with bandit feedback (i.e. learner has access to only zeroth-order oracle) where cost/reward functions admit a "pseudo-1d" structure. We show a lower bound of $min(sqrtdT, T3/4)$ for the regret of any algorithm, where $T$ is the number of rounds. We propose a new algorithm sbcalg that combines randomized online gradient descent with a kernelized exponential weights method to exploit the pseudo-1d structure effectively.
arXiv Detail & Related papers (2021-02-15T08:16:51Z)
List-Decodable Mean Estimation in Nearly-PCA Time [50.79691056481693]
We study the fundamental task of list-decodable mean estimation in high dimensions. Our algorithm runs in time $widetildeO(ndk)$ for all $k = O(sqrtd) cup Omega(d)$, where $n$ is the size of the dataset. A variant of our algorithm has runtime $widetildeO(ndk)$ for all $k$, at the expense of an $O(sqrtlog k)$ factor in the recovery guarantee
arXiv Detail & Related papers (2020-11-19T17:21:37Z)
Linear Time Sinkhorn Divergences using Positive Features [51.50788603386766]
Solving optimal transport with an entropic regularization requires computing a $ntimes n$ kernel matrix that is repeatedly applied to a vector. We propose to use instead ground costs of the form $c(x,y)=-logdotpvarphi(x)varphi(y)$ where $varphi$ is a map from the ground space onto the positive orthant $RRr_+$, with $rll n$.
arXiv Detail & Related papers (2020-06-12T10:21:40Z)
Adaptive exponential power distribution with moving estimator for nonstationary time series [0.8702432681310399]
We will focus on maximum likelihood (ML) adaptive estimation for nonstationary time series. We focus on such example: $rho(x)propto exp(-|(x-mu)/sigma|kappa/kappa)$ exponential power distribution (EPD) family. It is tested on daily log-return series for DJIA companies, leading to essentially better log-likelihoods than standard (static) estimation.
arXiv Detail & Related papers (2020-03-04T15:56:44Z)
Taking a hint: How to leverage loss predictors in contextual bandits? [63.546913998407405]
We study learning in contextual bandits with the help of loss predictors. We show that the optimal regret is $mathcalO(minsqrtT, sqrtmathcalETfrac13)$ when $mathcalE$ is known.
arXiv Detail & Related papers (2020-03-04T07:36:38Z)
Agnostic Q-learning with Function Approximation in Deterministic Systems: Tight Bounds on Approximation Error and Sample Complexity [94.37110094442136]
We study the problem of agnostic $Q$-learning with function approximation in deterministic systems. We show that if $delta = Oleft(rho/sqrtdim_Eright)$, then one can find the optimal policy using $Oleft(dim_Eright)$.
arXiv Detail & Related papers (2020-02-17T18:41:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.