Hierarchical Sliced Wasserstein Distance
- URL: http://arxiv.org/abs/2209.13570v3
- Date: Thu, 29 Sep 2022 01:17:03 GMT
- Title: Hierarchical Sliced Wasserstein Distance
- Authors: Khai Nguyen and Tongzheng Ren and Huy Nguyen and Litu Rout and Tan
Nguyen and Nhat Ho
- Abstract summary: Sliced Wasserstein (SW) distance can be scaled to a large number of supports without suffering from the curse of dimensionality.
Despite its efficiency in the number of supports, estimating the sliced Wasserstein requires a relatively large number of projections in high-dimensional settings.
We propose to derive projections by linearly and randomly combining a smaller number of projections which are named bottleneck projections.
We then formulate the approach into a new metric between measures, named Hierarchical Sliced Wasserstein (HSW) distance.
- Score: 27.12983497199479
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sliced Wasserstein (SW) distance has been widely used in different
application scenarios since it can be scaled to a large number of supports
without suffering from the curse of dimensionality. The value of sliced
Wasserstein distance is the average of transportation cost between
one-dimensional representations (projections) of original measures that are
obtained by Radon Transform (RT). Despite its efficiency in the number of
supports, estimating the sliced Wasserstein requires a relatively large number
of projections in high-dimensional settings. Therefore, for applications where
the number of supports is relatively small compared with the dimension, e.g.,
several deep learning applications where the mini-batch approaches are
utilized, the complexities from matrix multiplication of Radon Transform become
the main computational bottleneck. To address this issue, we propose to derive
projections by linearly and randomly combining a smaller number of projections
which are named bottleneck projections. We explain the usage of these
projections by introducing Hierarchical Radon Transform (HRT) which is
constructed by applying Radon Transform variants recursively. We then formulate
the approach into a new metric between measures, named Hierarchical Sliced
Wasserstein (HSW) distance. By proving the injectivity of HRT, we derive the
metricity of HSW. Moreover, we investigate the theoretical properties of HSW
including its connection to SW variants and its computational and sample
complexities. Finally, we compare the computational cost and generative quality
of HSW with the conventional SW on the task of deep generative modeling using
various benchmark datasets including CIFAR10, CelebA, and Tiny ImageNet.
Related papers
- Hierarchical Hybrid Sliced Wasserstein: A Scalable Metric for Heterogeneous Joint Distributions [39.94228953940542]
Sliced Wasserstein (SW) and Generalized Sliced Wasserstein (GSW) have been widely used in applications due to their computational and statistical scalability.
We propose two new slicing operators i.e., Partial Generalized Radon Transform (PGRT) and Hierarchical Hybrid Radon Transform (HHRT)
By using HHRT, we extend the SW into Hierarchical Hybrid Sliced Wasserstein (H2SW) distance which is designed specifically for comparing heterogeneous joint distributions.
arXiv Detail & Related papers (2024-04-23T03:04:22Z) - Sliced Wasserstein Estimation with Control Variates [47.18652387199418]
Sliced Wasserstein (SW) distances between two probability measures are defined as the expectation of the Wasserstein distance between two one-dimensional projections.
Due to the intractability of the expectation, Monte Carlo integration is performed to estimate the value of the SW distance.
Despite having various variants, there has been no prior work that improves the Monte Carlo estimation scheme for the SW distance.
arXiv Detail & Related papers (2023-04-30T06:03:17Z) - Linearized Wasserstein dimensionality reduction with approximation
guarantees [65.16758672591365]
LOT Wassmap is a computationally feasible algorithm to uncover low-dimensional structures in the Wasserstein space.
We show that LOT Wassmap attains correct embeddings and that the quality improves with increased sample size.
We also show how LOT Wassmap significantly reduces the computational cost when compared to algorithms that depend on pairwise distance computations.
arXiv Detail & Related papers (2023-02-14T22:12:16Z) - Markovian Sliced Wasserstein Distances: Beyond Independent Projections [51.80527230603978]
We introduce a new family of SW distances, named Markovian sliced Wasserstein (MSW) distance, which imposes a first-order Markov structure on projecting directions.
We compare distances with previous SW variants in various applications such as flows, color transfer, and deep generative modeling to demonstrate the favorable performance of MSW.
arXiv Detail & Related papers (2023-01-10T01:58:15Z) - Fast Approximation of the Sliced-Wasserstein Distance Using
Concentration of Random Projections [19.987683989865708]
The Sliced-Wasserstein distance (SW) is being increasingly used in machine learning applications.
We propose a new perspective to approximate SW by making use of the concentration of measure phenomenon.
Our method does not require sampling a number of random projections, and is therefore both accurate and easy to use compared to the usual Monte Carlo approximation.
arXiv Detail & Related papers (2021-06-29T13:56:19Z) - Learning High Dimensional Wasserstein Geodesics [55.086626708837635]
We propose a new formulation and learning strategy for computing the Wasserstein geodesic between two probability distributions in high dimensions.
By applying the method of Lagrange multipliers to the dynamic formulation of the optimal transport (OT) problem, we derive a minimax problem whose saddle point is the Wasserstein geodesic.
We then parametrize the functions by deep neural networks and design a sample based bidirectional learning algorithm for training.
arXiv Detail & Related papers (2021-02-05T04:25:28Z) - On Projection Robust Optimal Transport: Sample Complexity and Model
Misspecification [101.0377583883137]
Projection robust (PR) OT seeks to maximize the OT cost between two measures by choosing a $k$-dimensional subspace onto which they can be projected.
Our first contribution is to establish several fundamental statistical properties of PR Wasserstein distances.
Next, we propose the integral PR Wasserstein (IPRW) distance as an alternative to the PRW distance, by averaging rather than optimizing on subspaces.
arXiv Detail & Related papers (2020-06-22T14:35:33Z) - Augmented Sliced Wasserstein Distances [55.028065567756066]
We propose a new family of distance metrics, called augmented sliced Wasserstein distances (ASWDs)
ASWDs are constructed by first mapping samples to higher-dimensional hypersurfaces parameterized by neural networks.
Numerical results demonstrate that the ASWD significantly outperforms other Wasserstein variants for both synthetic and real-world problems.
arXiv Detail & Related papers (2020-06-15T23:00:08Z) - Distributional Sliced-Wasserstein and Applications to Generative
Modeling [27.014748003733544]
Sliced-Wasserstein distance (SW) and its variant, Max Sliced-Wasserstein distance (Max-SW) have been used widely in the recent years.
We propose a novel distance, named Distributional Sliced-Wasserstein distance (DSW)
We show that the DSW is a generalization of Max-SW, and it can be computed efficiently by searching for the optimal push-forward measure.
arXiv Detail & Related papers (2020-02-18T04:35:16Z) - Projected Stein Variational Gradient Descent [8.359602391273755]
We propose a projected Stein variational descent gradient (pSVGD) method to overcome the challenge of intrinsic low dimensionality of the data informed subspace.
We adaptively construct the subspace using a gradient information matrix of the log-likelihood, and apply pSVGD to the much lower-dimensional coefficients of the parameter projection.
arXiv Detail & Related papers (2020-02-09T23:17:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.