Related papers: Hierarchical Refinement: Optimal Transport to Infinity and Beyond

Hierarchical Refinement: Optimal Transport to Infinity and Beyond

URL: http://arxiv.org/abs/2503.03025v1
Date: Tue, 04 Mar 2025 22:00:12 GMT
Title: Hierarchical Refinement: Optimal Transport to Infinity and Beyond
Authors: Peter Halmos, Julian Gold, Xinhao Liu, Benjamin J. Raphael,
Abstract summary: Optimal transport (OT) has enjoyed great success in machine-learning as a principled way to align datasets via a least-cost correspondence.<n>Sinkhorn has quadratic space complexity in the number of points, limiting the scalability to larger datasets.<n>We derive an algorithm that dynamically constructs a multiscale partition of a dataset using low-rank OT subproblems.
Score: 1.8749305679160366
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Optimal transport (OT) has enjoyed great success in machine-learning as a principled way to align datasets via a least-cost correspondence. This success was driven in large part by the runtime efficiency of the Sinkhorn algorithm [Cuturi 2013], which computes a coupling between points from two datasets. However, Sinkhorn has quadratic space complexity in the number of points, limiting the scalability to larger datasets. Low-rank OT achieves linear-space complexity, but by definition, cannot compute a one-to-one correspondence between points. When the optimal transport problem is an assignment problem between datasets then the optimal mapping, known as the Monge map, is guaranteed to be a bijection. In this setting, we show that the factors of an optimal low-rank coupling co-cluster each point with its image under the Monge map. We leverage this invariant to derive an algorithm, Hierarchical Refinement (HiRef), that dynamically constructs a multiscale partition of a dataset using low-rank OT subproblems, culminating in a bijective coupling. Hierarchical Refinement uses linear space and has log-linear runtime, retaining the space advantage of low-rank OT while overcoming its limited resolution. We demonstrate the advantages of Hierarchical Refinement on several datasets, including ones containing over a million points, scaling full-rank OT to problems previously beyond Sinkhorn's reach.

Related papers

Exact and Heuristic Algorithms for Constrained Biclustering [0.0]
Biclustering, also known as co-clustering or two-way clustering, simultaneously partitions the rows and columns of a data matrix to reveal submatrices with coherent patterns.<n>We study constrained biclustering with pairwise constraints, namely must-link and cannot-link constraints, which specify whether objects should belong to the same or different biclusters.
arXiv Detail & Related papers (2025-08-07T15:29:22Z)
Differentiable Generalized Sliced Wasserstein Plans [10.764247782316984]
Optimal Transport (OT) has attracted significant interest in the machine learning community.<n>A novel slicing scheme, dubbed min-SWGG, lifts a single one-dimensional plan back to the original multidimensional space.<n>We show that min-SWGG inherits typical limitations of slicing methods.<n>We propose a differentiable approximation scheme to efficiently identify the optimal slice, even in high-dimensional settings.
arXiv Detail & Related papers (2025-05-28T07:18:08Z)
Near-Optimal Online Learning for Multi-Agent Submodular Coordination: Tight Approximation and Communication Efficiency [52.60557300927007]
We present a $textbfMA-OSMA$ algorithm to transfer the discrete submodular problem into a continuous optimization.<n>We also introduce a projection-free $textbfMA-OSEA$ algorithm, which effectively utilizes the KL divergence by mixing a uniform distribution.<n>Our algorithms significantly improve the $(frac11+c)$-approximation provided by the state-of-the-art OSG algorithm.
arXiv Detail & Related papers (2025-02-07T15:57:56Z)
Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel. An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost. Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z)
Linearized Wasserstein dimensionality reduction with approximation guarantees [65.16758672591365]
LOT Wassmap is a computationally feasible algorithm to uncover low-dimensional structures in the Wasserstein space. We show that LOT Wassmap attains correct embeddings and that the quality improves with increased sample size. We also show how LOT Wassmap significantly reduces the computational cost when compared to algorithms that depend on pairwise distance computations.
arXiv Detail & Related papers (2023-02-14T22:12:16Z)
Unfolding Projection-free SDP Relaxation of Binary Graph Classifier via GDPA Linearization [59.87663954467815]
Algorithm unfolding creates an interpretable and parsimonious neural network architecture by implementing each iteration of a model-based algorithm as a neural layer. In this paper, leveraging a recent linear algebraic theorem called Gershgorin disc perfect alignment (GDPA), we unroll a projection-free algorithm for semi-definite programming relaxation (SDR) of a binary graph. Experimental results show that our unrolled network outperformed pure model-based graph classifiers, and achieved comparable performance to pure data-driven networks but using far fewer parameters.
arXiv Detail & Related papers (2021-09-10T07:01:15Z)
Fast and Scalable Optimal Transport for Brain Tractograms [4.610968512889579]
We present a new multiscale algorithm for solving regularized Optimal Transport problems on a linear memory footprint. We show the effectiveness of this approach on brain tractograms modeled either as bundles of fibers or as track density maps.
arXiv Detail & Related papers (2021-07-05T13:28:41Z)
ParChain: A Framework for Parallel Hierarchical Agglomerative Clustering using Nearest-Neighbor Chain [6.824747267214373]
We propose the ParChain framework for designing parallel hierarchical agglomerative clustering (HAC) algorithms. Compared to most previous parallel HAC algorithms, our new algorithms require only linear memory, and are scalable to large data sets. Our algorithms are able to scale to data set sizes with tens of millions of points, which existing algorithms are not able to handle.
arXiv Detail & Related papers (2021-06-08T23:13:27Z)
Linear-Time Gromov Wasserstein Distances using Low Rank Couplings and Costs [45.87981728307819]
The ability to compare and align related datasets living in heterogeneous spaces plays an increasingly important role in machine learning. The Gromov-Wasserstein (GW) formalism can help tackle this problem.
arXiv Detail & Related papers (2021-06-02T12:50:56Z)
Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network. We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv Detail & Related papers (2021-04-03T09:08:12Z)
Hybrid Trilinear and Bilinear Programming for Aligning Partially Overlapping Point Sets [85.71360365315128]
In many applications, we need algorithms which can align partially overlapping point sets are invariant to the corresponding corresponding RPM algorithm. We first show that the objective is a cubic bound function. We then utilize the convex envelopes of trilinear and bilinear monomial transformations to derive its lower bound. We next develop a branch-and-bound (BnB) algorithm which only branches over the transformation variables and runs efficiently.
arXiv Detail & Related papers (2021-01-19T04:24:23Z)
Block majorization-minimization with diminishing radius for constrained nonsmooth nonconvex optimization [8.386501595252]
Block majorization-minimativeization (BMM) is a simple iterative algorithm for constrained nonnegative surrogates.<n>We show that BMM produces a novel first-order optimality measure for various algorithms.<n>We also demonstrate that the additional use of diminishing radius can improve the convergence rate of BMM in many instances.
arXiv Detail & Related papers (2020-12-07T07:53:09Z)
A Study of Performance of Optimal Transport [16.847501106437534]
We show that network simplex and augmenting path based algorithms can consistently outperform numerical matrix-scaling based methods. We present a new algorithm that improves upon the classical Kuhn-Munkres algorithm.
arXiv Detail & Related papers (2020-05-03T20:37:05Z)
Non-Adaptive Adaptive Sampling on Turnstile Streams [57.619901304728366]
We give the first relative-error algorithms for column subset selection, subspace approximation, projective clustering, and volume on turnstile streams that use space sublinear in $n$. Our adaptive sampling procedure has a number of applications to various data summarization problems that either improve state-of-the-art or have only been previously studied in the more relaxed row-arrival model.
arXiv Detail & Related papers (2020-04-23T05:00:21Z)
Efficient and Robust Shape Correspondence via Sparsity-Enforced Quadratic Assignment [16.03666555216332]
We introduce a novel local pairwise descriptor and then develop a simple, effective iterative method to solve the resulting quadratic assignment. Our pairwise descriptor is based on the stiffness and mass iteration matrix of finite element approximation of the Laplace-Beltrami differential operator. We use various experiments to show the efficiency, quality, and versatility of our method on large data sets, patches, and point clouds.
arXiv Detail & Related papers (2020-03-19T10:56:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.