Geodesic Sinkhorn for Fast and Accurate Optimal Transport on Manifolds
- URL: http://arxiv.org/abs/2211.00805v2
- Date: Tue, 26 Sep 2023 13:12:52 GMT
- Title: Geodesic Sinkhorn for Fast and Accurate Optimal Transport on Manifolds
- Authors: Guillaume Huguet, Alexander Tong, Mar\'ia Ramos Zapatero, Christopher
J. Tape, Guy Wolf, Smita Krishnaswamy
- Abstract summary: We propose Geodesic Sinkhorn -- based on a heat kernel on a manifold graph.
We apply our method to the computation of barycenters of several distributions of high dimensional single cell data from patient samples undergoing chemotherapy.
- Score: 53.110934987571355
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Efficient computation of optimal transport distance between distributions is
of growing importance in data science. Sinkhorn-based methods are currently the
state-of-the-art for such computations, but require $O(n^2)$ computations. In
addition, Sinkhorn-based methods commonly use an Euclidean ground distance
between datapoints. However, with the prevalence of manifold structured
scientific data, it is often desirable to consider geodesic ground distance.
Here, we tackle both issues by proposing Geodesic Sinkhorn -- based on
diffusing a heat kernel on a manifold graph. Notably, Geodesic Sinkhorn
requires only $O(n\log n)$ computation, as we approximate the heat kernel with
Chebyshev polynomials based on the sparse graph Laplacian. We apply our method
to the computation of barycenters of several distributions of high dimensional
single cell data from patient samples undergoing chemotherapy. In particular,
we define the barycentric distance as the distance between two such
barycenters. Using this definition, we identify an optimal transport distance
and path associated with the effect of treatment on cellular data.
Related papers
- Relative-Translation Invariant Wasserstein Distance [82.6068808353647]
We introduce a new family of distances, relative-translation invariant Wasserstein distances ($RW_p$)
We show that $RW_p distances are also real distance metrics defined on the quotient set $mathcalP_p(mathbbRn)/sim$ invariant to distribution translations.
arXiv Detail & Related papers (2024-09-04T03:41:44Z) - Sketching the Heat Kernel: Using Gaussian Processes to Embed Data [4.220336689294244]
We introduce a novel, non-deterministic method for embedding data in low-dimensional Euclidean space based on realizations of a Gaussian process depending on the geometry of the data.
Our method demonstrates further advantage in its robustness to outliers.
arXiv Detail & Related papers (2024-03-01T22:56:19Z) - Bivariate DeepKriging for Large-scale Spatial Interpolation of Wind Fields [2.586710925821896]
High spatial resolution wind data are essential for a wide range of applications in climate, oceanographic and meteorological studies.
Large-scale spatial computation or downscaling of bivariate wind fields having velocity in two dimensions is a challenging task.
In this paper, we propose a method, called bivariate DeepKriging, which is a spatially dependent deep neural network (DNN) with an embedding layer constructed by spatial radial basis functions.
We demonstrate the computational efficiency and scalability of the proposed DNN model, with computations that are, on average, 20 times faster than those of conventional techniques.
arXiv Detail & Related papers (2023-07-16T13:34:44Z) - Graph Fourier MMD for Signals on Graphs [67.68356461123219]
We propose a novel distance between distributions and signals on graphs.
GFMMD is defined via an optimal witness function that is both smooth on the graph and maximizes difference in expectation.
We showcase it on graph benchmark datasets as well as on single cell RNA-sequencing data analysis.
arXiv Detail & Related papers (2023-06-05T00:01:17Z) - GraphWalks: Efficient Shape Agnostic Geodesic Shortest Path Estimation [93.60478281489243]
We propose a learnable network to approximate geodesic paths on 3D surfaces.
The proposed method provides efficient approximations of the shortest paths and geodesic distances estimations.
arXiv Detail & Related papers (2022-05-30T16:22:53Z) - Entropic Optimal Transport in Random Graphs [8.7314407902481]
In graph analysis, a classic task consists in computing similarity measures between (groups of) nodes.
We show that it is possible to consistently estimate distances between groups of nodes in the latent space.
arXiv Detail & Related papers (2022-01-11T13:52:34Z) - Near-optimal estimation of smooth transport maps with kernel
sums-of-squares [81.02564078640275]
Under smoothness conditions, the squared Wasserstein distance between two distributions could be efficiently computed with appealing statistical error upper bounds.
The object of interest for applications such as generative modeling is the underlying optimal transport map.
We propose the first tractable algorithm for which the statistical $L2$ error on the maps nearly matches the existing minimax lower-bounds for smooth map estimation.
arXiv Detail & Related papers (2021-12-03T13:45:36Z) - Fast and Scalable Optimal Transport for Brain Tractograms [4.610968512889579]
We present a new multiscale algorithm for solving regularized Optimal Transport problems on a linear memory footprint.
We show the effectiveness of this approach on brain tractograms modeled either as bundles of fibers or as track density maps.
arXiv Detail & Related papers (2021-07-05T13:28:41Z) - Combining Pseudo-Point and State Space Approximations for Sum-Separable
Gaussian Processes [48.64129867897491]
We show that there is a simple and elegant way to combine pseudo-point methods with the state space GP approximation framework to get the best of both worlds.
We demonstrate that the combined approach is more scalable and applicable to a greater range of epidemiology--temporal problems than either method on its own.
arXiv Detail & Related papers (2021-06-18T16:30:09Z) - Convergence of Gaussian-smoothed optimal transport distance with
sub-gamma distributions and dependent samples [12.77426855794452]
This paper provides convergence guarantees for estimating the GOT distance under more general settings.
A key step in our analysis is to show that the GOT distance is dominated by a family of kernel maximum mean discrepancy distances.
arXiv Detail & Related papers (2021-02-28T04:30:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.