Learning Ultrametric Trees for Optimal Transport Regression
- URL: http://arxiv.org/abs/2210.12288v2
- Date: Fri, 26 Jan 2024 22:03:58 GMT
- Title: Learning Ultrametric Trees for Optimal Transport Regression
- Authors: Samantha Chen, Puoya Tabaghi, Yusu Wang
- Abstract summary: We find an optimal tree structure for a given discrete metric space.
One of our key ideas is to cast the problem in ultrametric spaces.
- Score: 10.524752369156337
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Optimal transport provides a metric which quantifies the dissimilarity
between probability measures. For measures supported in discrete metric spaces,
finding the optimal transport distance has cubic time complexity in the size of
the space. However, measures supported on trees admit a closed-form optimal
transport that can be computed in linear time. In this paper, we aim to find an
optimal tree structure for a given discrete metric space so that the
tree-Wasserstein distance approximates the optimal transport distance in the
original space. One of our key ideas is to cast the problem in ultrametric
spaces. This helps us optimize over the space of ultrametric trees -- a
mixed-discrete and continuous optimization problem -- via projected gradient
decent over the space of ultrametric matrices. During optimization, we project
the parameters to the ultrametric space via a hierarchical minimum spanning
tree algorithm, equivalent to the closest projection to ultrametrics under the
supremum norm. Experimental results on real datasets show that our approach
outperforms previous approaches (e.g. Flowtree, Quadtree) in approximating
optimal transport distances. Finally, experiments on synthetic data generated
on ground truth trees show that our algorithm can accurately uncover the
underlying trees.
Related papers
- Fast unsupervised ground metric learning with tree-Wasserstein distance [14.235762519615175]
unsupervised ground metric learning approaches have been introduced.
We propose to augment the WSV method by embedding samples and features on trees, on which we compute the tree-Wasserstein distance (TWD)
We demonstrate theoretically and empirically that the algorithm converges to a better approximation of the full WSV approach than the best known alternatives, and does so with $mathcalO(n3)$ complexity.
arXiv Detail & Related papers (2024-11-11T23:21:01Z) - Linearized Wasserstein dimensionality reduction with approximation
guarantees [65.16758672591365]
LOT Wassmap is a computationally feasible algorithm to uncover low-dimensional structures in the Wasserstein space.
We show that LOT Wassmap attains correct embeddings and that the quality improves with increased sample size.
We also show how LOT Wassmap significantly reduces the computational cost when compared to algorithms that depend on pairwise distance computations.
arXiv Detail & Related papers (2023-02-14T22:12:16Z) - Fast Computation of Optimal Transport via Entropy-Regularized Extragradient Methods [75.34939761152587]
Efficient computation of the optimal transport distance between two distributions serves as an algorithm that empowers various applications.
This paper develops a scalable first-order optimization-based method that computes optimal transport to within $varepsilon$ additive accuracy.
arXiv Detail & Related papers (2023-01-30T15:46:39Z) - Geodesic Sinkhorn for Fast and Accurate Optimal Transport on Manifolds [53.110934987571355]
We propose Geodesic Sinkhorn -- based on a heat kernel on a manifold graph.
We apply our method to the computation of barycenters of several distributions of high dimensional single cell data from patient samples undergoing chemotherapy.
arXiv Detail & Related papers (2022-11-02T00:51:35Z) - Efficient Non-Parametric Optimizer Search for Diverse Tasks [93.64739408827604]
We present the first efficient scalable and general framework that can directly search on the tasks of interest.
Inspired by the innate tree structure of the underlying math expressions, we re-arrange the spaces into a super-tree.
We adopt an adaptation of the Monte Carlo method to tree search, equipped with rejection sampling and equivalent- form detection.
arXiv Detail & Related papers (2022-09-27T17:51:31Z) - Tree ensemble kernels for Bayesian optimization with known constraints
over mixed-feature spaces [54.58348769621782]
Tree ensembles can be well-suited for black-box optimization tasks such as algorithm tuning and neural architecture search.
Two well-known challenges in using tree ensembles for black-box optimization are (i) effectively quantifying model uncertainty for exploration and (ii) optimizing over the piece-wise constant acquisition function.
Our framework performs as well as state-of-the-art methods for unconstrained black-box optimization over continuous/discrete features and outperforms competing methods for problems combining mixed-variable feature spaces and known input constraints.
arXiv Detail & Related papers (2022-07-02T16:59:37Z) - Near-optimal estimation of smooth transport maps with kernel
sums-of-squares [81.02564078640275]
Under smoothness conditions, the squared Wasserstein distance between two distributions could be efficiently computed with appealing statistical error upper bounds.
The object of interest for applications such as generative modeling is the underlying optimal transport map.
We propose the first tractable algorithm for which the statistical $L2$ error on the maps nearly matches the existing minimax lower-bounds for smooth map estimation.
arXiv Detail & Related papers (2021-12-03T13:45:36Z) - Entropic estimation of optimal transport maps [15.685006881635209]
We develop a method for estimating the optimal map between two distributions over $mathbbRd$ with rigorous finite-sample guarantees.
We show that our estimator is easy to compute using Sinkhorn's algorithm.
arXiv Detail & Related papers (2021-09-24T14:57:26Z) - Optimal Decision Trees for Nonlinear Metrics [42.18286681448184]
We show a novel algorithm for producing optimal trees for nonlinear metrics.
To the best of our knowledge, this is the first method to compute provably optimal decision trees for nonlinear metrics.
Our approach leads to a trade-off when compared to optimising linear metrics.
arXiv Detail & Related papers (2020-09-15T08:30:56Z) - Fast Unbalanced Optimal Transport on a Tree [40.60905158071766]
This study examines the time complexities of the unbalanced optimal transport problems from an algorithmic perspective for the first time.
We prove that the Kantorovich Rubinstein distance and optimal partial transport in the Euclidean metric cannot be computed in strongly subquadratic time.
Then, we propose an algorithm that solves a more general unbalanced optimal transport problem exactly in quasi-linear time on a tree metric.
arXiv Detail & Related papers (2020-06-04T08:43:59Z) - A Practical Index Structure Supporting Fr\'echet Proximity Queries Among
Trajectories [1.9335262420787858]
We present a scalable approach for range and $k$ nearest neighbor queries under computationally expensive metrics.
Based on clustering for metric indexes, we obtain a dynamic tree structure whose size is linear in the number of trajectories.
We analyze the efficiency and effectiveness of our methods with extensive experiments on diverse synthetic and real-world data sets.
arXiv Detail & Related papers (2020-05-28T04:12:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.