Related papers: Learning Ultrametric Trees for Optimal Transport Regression

Learning Ultrametric Trees for Optimal Transport Regression

URL: http://arxiv.org/abs/2210.12288v2
Date: Fri, 26 Jan 2024 22:03:58 GMT
Title: Learning Ultrametric Trees for Optimal Transport Regression
Authors: Samantha Chen, Puoya Tabaghi, Yusu Wang
Abstract summary: We find an optimal tree structure for a given discrete metric space. One of our key ideas is to cast the problem in ultrametric spaces.
Score: 10.524752369156337
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Optimal transport provides a metric which quantifies the dissimilarity between probability measures. For measures supported in discrete metric spaces, finding the optimal transport distance has cubic time complexity in the size of the space. However, measures supported on trees admit a closed-form optimal transport that can be computed in linear time. In this paper, we aim to find an optimal tree structure for a given discrete metric space so that the tree-Wasserstein distance approximates the optimal transport distance in the original space. One of our key ideas is to cast the problem in ultrametric spaces. This helps us optimize over the space of ultrametric trees -- a mixed-discrete and continuous optimization problem -- via projected gradient decent over the space of ultrametric matrices. During optimization, we project the parameters to the ultrametric space via a hierarchical minimum spanning tree algorithm, equivalent to the closest projection to ultrametrics under the supremum norm. Experimental results on real datasets show that our approach outperforms previous approaches (e.g. Flowtree, Quadtree) in approximating optimal transport distances. Finally, experiments on synthetic data generated on ground truth trees show that our algorithm can accurately uncover the underlying trees.

Related papers

Infinity Search: Approximate Vector Search with Projections on q-Metric Spaces [94.12116458306916]
We show that in $q$-metric spaces, metric trees can leverage a stronger version of the triangle inequality to reduce comparisons for exact search.<n>We propose a novel projection method that embeds datasets with arbitrary dissimilarity measures into $q$-metric spaces.
arXiv Detail & Related papers (2025-06-06T22:09:44Z)
Bridging Arbitrary and Tree Metrics via Differentiable Gromov Hyperbolicity [13.751664797063208]
Given an arbitrary metric space, its deviation from a tree metric can be quantified by Gromov's $delta$-hyperbolicity.<n>We introduce a novel differentiable optimization framework, coined DeltaZero, that solves this problem.<n>Our method consistently achieves state-of-the-art distortion.
arXiv Detail & Related papers (2025-05-27T11:58:37Z)
Fast unsupervised ground metric learning with tree-Wasserstein distance [14.235762519615175]
unsupervised ground metric learning approaches have been introduced. We propose to augment the WSV method by embedding samples and features on trees, on which we compute the tree-Wasserstein distance (TWD) We demonstrate theoretically and empirically that the algorithm converges to a better approximation of the full WSV approach than the best known alternatives, and does so with $mathcalO(n3)$ complexity.
arXiv Detail & Related papers (2024-11-11T23:21:01Z)
Linearized Wasserstein dimensionality reduction with approximation guarantees [65.16758672591365]
LOT Wassmap is a computationally feasible algorithm to uncover low-dimensional structures in the Wasserstein space. We show that LOT Wassmap attains correct embeddings and that the quality improves with increased sample size. We also show how LOT Wassmap significantly reduces the computational cost when compared to algorithms that depend on pairwise distance computations.
arXiv Detail & Related papers (2023-02-14T22:12:16Z)
Fast Computation of Optimal Transport via Entropy-Regularized Extragradient Methods [75.34939761152587]
Efficient computation of the optimal transport distance between two distributions serves as an algorithm that empowers various applications. This paper develops a scalable first-order optimization-based method that computes optimal transport to within $varepsilon$ additive accuracy.
arXiv Detail & Related papers (2023-01-30T15:46:39Z)
Geodesic Sinkhorn for Fast and Accurate Optimal Transport on Manifolds [53.110934987571355]
We propose Geodesic Sinkhorn -- based on a heat kernel on a manifold graph. We apply our method to the computation of barycenters of several distributions of high dimensional single cell data from patient samples undergoing chemotherapy.
arXiv Detail & Related papers (2022-11-02T00:51:35Z)
Efficient Non-Parametric Optimizer Search for Diverse Tasks [93.64739408827604]
We present the first efficient scalable and general framework that can directly search on the tasks of interest. Inspired by the innate tree structure of the underlying math expressions, we re-arrange the spaces into a super-tree. We adopt an adaptation of the Monte Carlo method to tree search, equipped with rejection sampling and equivalent- form detection.
arXiv Detail & Related papers (2022-09-27T17:51:31Z)
Tree ensemble kernels for Bayesian optimization with known constraints over mixed-feature spaces [54.58348769621782]
Tree ensembles can be well-suited for black-box optimization tasks such as algorithm tuning and neural architecture search. Two well-known challenges in using tree ensembles for black-box optimization are (i) effectively quantifying model uncertainty for exploration and (ii) optimizing over the piece-wise constant acquisition function. Our framework performs as well as state-of-the-art methods for unconstrained black-box optimization over continuous/discrete features and outperforms competing methods for problems combining mixed-variable feature spaces and known input constraints.
arXiv Detail & Related papers (2022-07-02T16:59:37Z)
Near-optimal estimation of smooth transport maps with kernel sums-of-squares [81.02564078640275]
Under smoothness conditions, the squared Wasserstein distance between two distributions could be efficiently computed with appealing statistical error upper bounds. The object of interest for applications such as generative modeling is the underlying optimal transport map. We propose the first tractable algorithm for which the statistical $L2$ error on the maps nearly matches the existing minimax lower-bounds for smooth map estimation.
arXiv Detail & Related papers (2021-12-03T13:45:36Z)
Entropic estimation of optimal transport maps [15.685006881635209]
We develop a method for estimating the optimal map between two distributions over $mathbbRd$ with rigorous finite-sample guarantees. We show that our estimator is easy to compute using Sinkhorn's algorithm.
arXiv Detail & Related papers (2021-09-24T14:57:26Z)
Optimal Decision Trees for Nonlinear Metrics [42.18286681448184]
We show a novel algorithm for producing optimal trees for nonlinear metrics. To the best of our knowledge, this is the first method to compute provably optimal decision trees for nonlinear metrics. Our approach leads to a trade-off when compared to optimising linear metrics.
arXiv Detail & Related papers (2020-09-15T08:30:56Z)
Fast Unbalanced Optimal Transport on a Tree [40.60905158071766]
This study examines the time complexities of the unbalanced optimal transport problems from an algorithmic perspective for the first time. We prove that the Kantorovich Rubinstein distance and optimal partial transport in the Euclidean metric cannot be computed in strongly subquadratic time. Then, we propose an algorithm that solves a more general unbalanced optimal transport problem exactly in quasi-linear time on a tree metric.
arXiv Detail & Related papers (2020-06-04T08:43:59Z)
A Practical Index Structure Supporting Fr\'echet Proximity Queries Among Trajectories [1.9335262420787858]
We present a scalable approach for range and $k$ nearest neighbor queries under computationally expensive metrics. Based on clustering for metric indexes, we obtain a dynamic tree structure whose size is linear in the number of trajectories. We analyze the efficiency and effectiveness of our methods with extensive experiments on diverse synthetic and real-world data sets.
arXiv Detail & Related papers (2020-05-28T04:12:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.