Wasserstein k-means with sparse simplex projection
- URL: http://arxiv.org/abs/2011.12542v1
- Date: Wed, 25 Nov 2020 06:37:45 GMT
- Title: Wasserstein k-means with sparse simplex projection
- Authors: Takumi Fukunaga, Hiroyuki Kasai
- Abstract summary: This paper presents a proposal of a faster Wasserstein $k$-means algorithm for histogram data.
We shrink data samples, centroids, and the ground cost matrix.
We harness sparse simplex projection while keeping the degradation of clustering quality lower.
- Score: 20.661025590877774
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents a proposal of a faster Wasserstein $k$-means algorithm
for histogram data by reducing Wasserstein distance computations and exploiting
sparse simplex projection. We shrink data samples, centroids, and the ground
cost matrix, which leads to considerable reduction of the computations used to
solve optimal transport problems without loss of clustering quality.
Furthermore, we dynamically reduced the computational complexity by removing
lower-valued data samples and harnessing sparse simplex projection while
keeping the degradation of clustering quality lower. We designate this proposed
algorithm as sparse simplex projection based Wasserstein $k$-means, or SSPW
$k$-means. Numerical evaluations conducted with comparison to results obtained
using Wasserstein $k$-means algorithm demonstrate the effectiveness of the
proposed SSPW $k$-means for real-world datasets
Related papers
- Linearized Wasserstein dimensionality reduction with approximation
guarantees [65.16758672591365]
LOT Wassmap is a computationally feasible algorithm to uncover low-dimensional structures in the Wasserstein space.
We show that LOT Wassmap attains correct embeddings and that the quality improves with increased sample size.
We also show how LOT Wassmap significantly reduces the computational cost when compared to algorithms that depend on pairwise distance computations.
arXiv Detail & Related papers (2023-02-14T22:12:16Z) - Sketch-and-Lift: Scalable Subsampled Semidefinite Program for $K$-means
Clustering [16.153709556346417]
Semidefinite programming (SDP) is a powerful tool for tackling a wide range of computationally hard problems such as clustering.
We introduce a linear time complexity algorithm for approximating an SDP relaxed $K$-means clustering.
Our simulation experiments demonstrate that the statistical accuracy of the proposed method outperforms state-of-the-art fast clustering algorithms.
arXiv Detail & Related papers (2022-01-20T15:31:28Z) - Clustering Mixture Models in Almost-Linear Time via List-Decodable Mean
Estimation [58.24280149662003]
We study the problem of list-decodable mean estimation, where an adversary can corrupt a majority of the dataset.
We develop new algorithms for list-decodable mean estimation, achieving nearly-optimal statistical guarantees.
arXiv Detail & Related papers (2021-06-16T03:34:14Z) - Partial Wasserstein Covering [10.52782170493037]
We consider a general task called partial Wasserstein covering with the goal of emulating a large dataset.
We model this problem as a discrete optimization problem with partial Wasserstein divergence as an objective function.
We show that we can efficiently make two datasets similar in terms of partial Wasserstein divergence, including driving scene datasets.
arXiv Detail & Related papers (2021-06-02T01:48:41Z) - Projection Robust Wasserstein Barycenter [36.97843660480747]
approximating the Wasserstein barycenter is numerically challenging because of the curse of dimensionality.
This paper proposes the projection robust Wasserstein barycenter (PRWB) that mitigates the curse of dimensionality.
arXiv Detail & Related papers (2021-02-05T19:23:35Z) - Learning High Dimensional Wasserstein Geodesics [55.086626708837635]
We propose a new formulation and learning strategy for computing the Wasserstein geodesic between two probability distributions in high dimensions.
By applying the method of Lagrange multipliers to the dynamic formulation of the optimal transport (OT) problem, we derive a minimax problem whose saddle point is the Wasserstein geodesic.
We then parametrize the functions by deep neural networks and design a sample based bidirectional learning algorithm for training.
arXiv Detail & Related papers (2021-02-05T04:25:28Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - K-Deep Simplex: Deep Manifold Learning via Local Dictionaries [8.137198664755598]
We propose K-Deep Simplex which learns a dictionary comprising synthetic landmarks, along with representation coefficients supported on a simplex.
We solve the proposed optimization program using alternating minimization and design an efficient, interpretable autoencoder using algorithm unrolling.
Experiments show that the algorithm is highly efficient and performs competitively on synthetic and real data sets.
arXiv Detail & Related papers (2020-12-03T18:13:26Z) - List-Decodable Mean Estimation in Nearly-PCA Time [50.79691056481693]
We study the fundamental task of list-decodable mean estimation in high dimensions.
Our algorithm runs in time $widetildeO(ndk)$ for all $k = O(sqrtd) cup Omega(d)$, where $n$ is the size of the dataset.
A variant of our algorithm has runtime $widetildeO(ndk)$ for all $k$, at the expense of an $O(sqrtlog k)$ factor in the recovery guarantee
arXiv Detail & Related papers (2020-11-19T17:21:37Z) - Computationally efficient sparse clustering [67.95910835079825]
We provide a finite sample analysis of a new clustering algorithm based on PCA.
We show that it achieves the minimax optimal misclustering rate in the regime $|theta infty$.
arXiv Detail & Related papers (2020-05-21T17:51:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.