Accelerating Spherical k-Means
- URL: http://arxiv.org/abs/2107.04074v1
- Date: Thu, 8 Jul 2021 19:24:09 GMT
- Title: Accelerating Spherical k-Means
- Authors: Erich Schubert and Andreas Lang and Gloria Feher
- Abstract summary: Spherical k-means is a widely used clustering algorithm for sparse and high-dimensional data such as document vectors.
Many acceleration techniques, such as the algorithms of Elkan and Hamerly, rely on the triangle inequality of Euclidean distances.
In this paper, we incorporate the Elkan and Hamerly accelerations to the spherical k-means algorithm working directly with the Cosines instead of Euclidean distances.
- Score: 1.7188280334580197
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Spherical k-means is a widely used clustering algorithm for sparse and
high-dimensional data such as document vectors. While several improvements and
accelerations have been introduced for the original k-means algorithm, not all
easily translate to the spherical variant: Many acceleration techniques, such
as the algorithms of Elkan and Hamerly, rely on the triangle inequality of
Euclidean distances. However, spherical k-means uses Cosine similarities
instead of distances for computational efficiency. In this paper, we
incorporate the Elkan and Hamerly accelerations to the spherical k-means
algorithm working directly with the Cosines instead of Euclidean distances to
obtain a substantial speedup and evaluate these spherical accelerations on real
data.
Related papers
- Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - Geodesic Sinkhorn for Fast and Accurate Optimal Transport on Manifolds [53.110934987571355]
We propose Geodesic Sinkhorn -- based on a heat kernel on a manifold graph.
We apply our method to the computation of barycenters of several distributions of high dimensional single cell data from patient samples undergoing chemotherapy.
arXiv Detail & Related papers (2022-11-02T00:51:35Z) - Rapid Person Re-Identification via Sub-space Consistency Regularization [51.76876061721556]
Person Re-Identification (ReID) matches pedestrians across disjoint cameras.
Existing ReID methods adopting real-value feature descriptors have achieved high accuracy, but they are low in efficiency due to the slow Euclidean distance computation.
We propose a novel Sub-space Consistency Regularization (SCR) algorithm that can speed up the ReID procedure by 0.25$ times.
arXiv Detail & Related papers (2022-07-13T02:44:05Z) - A density peaks clustering algorithm with sparse search and K-d tree [16.141611031128427]
Density peaks clustering algorithm with sparse search and K-d tree is developed to solve this problem.
Experiments are carried out on datasets with different distribution characteristics, by comparing with other five typical clustering algorithms.
arXiv Detail & Related papers (2022-03-02T09:29:40Z) - Rectified Euler k-means and Beyond [27.471041765364884]
Euler k-means (EulerK) first maps data onto the unit hyper-sphere surface of equi-dimensional space via a complex mapping.
EulerK is also robust to noises and outliers.
We propose two Rectified Euler k-means methods, i.e., REK1 and REK2, which retain the merits of EulerK while acquire real centroids residing on the mapped space to better characterize the data structures.
arXiv Detail & Related papers (2021-08-06T12:35:28Z) - Efficient Sparse Spherical k-Means for Document Clustering [13.217173710137363]
We propose an efficient indexing structure to improve the scalability of Spherical k-Means with respect to k.
Our approach exploits the sparsity of the input vectors and the convergence behavior of k-Means to reduce the number of comparisons on each iteration significantly.
arXiv Detail & Related papers (2021-07-30T12:02:33Z) - (k, l)-Medians Clustering of Trajectories Using Continuous Dynamic Time
Warping [57.316437798033974]
In this work we consider the problem of center-based clustering of trajectories.
We propose the usage of a continuous version of DTW as distance measure, which we call continuous dynamic time warping (CDTW)
We show a practical way to compute a center from a set of trajectories and subsequently iteratively improve it.
arXiv Detail & Related papers (2020-12-01T13:17:27Z) - Fast Gravitational Approach for Rigid Point Set Registration with
Ordinary Differential Equations [79.71184760864507]
This article introduces a new physics-based method for rigid point set alignment called Fast Gravitational Approach (FGA)
In FGA, the source and target point sets are interpreted as rigid particle swarms with masses interacting in a globally multiply-linked manner while moving in a simulated gravitational force field.
We show that the new method class has characteristics not found in previous alignment methods.
arXiv Detail & Related papers (2020-09-28T15:05:39Z) - Ball k-means [53.89505717006118]
The Ball k-means algorithm uses a ball to describe a cluster, focusing on reducing the point-centroid distance computation.
The fast speed, no extra parameters and simple design of the Ball k-means make it an all-around replacement of the naive k-means algorithm.
arXiv Detail & Related papers (2020-05-02T10:39:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.