Rectified Euler k-means and Beyond
- URL: http://arxiv.org/abs/2108.03081v1
- Date: Fri, 6 Aug 2021 12:35:28 GMT
- Title: Rectified Euler k-means and Beyond
- Authors: Yunxia Lin, Songcan chen
- Abstract summary: Euler k-means (EulerK) first maps data onto the unit hyper-sphere surface of equi-dimensional space via a complex mapping.
EulerK is also robust to noises and outliers.
We propose two Rectified Euler k-means methods, i.e., REK1 and REK2, which retain the merits of EulerK while acquire real centroids residing on the mapped space to better characterize the data structures.
- Score: 27.471041765364884
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Euler k-means (EulerK) first maps data onto the unit hyper-sphere surface of
equi-dimensional space via a complex mapping which induces the robust Euler
kernel and next employs the popular $k$-means. Consequently, besides enjoying
the virtues of k-means such as simplicity and scalability to large data sets,
EulerK is also robust to noises and outliers. Although so, the centroids
captured by EulerK deviate from the unit hyper-sphere surface and thus in
strict distributional sense, actually are outliers. This weird phenomenon also
occurs in some generic kernel clustering methods. Intuitively, using such
outlier-like centroids should not be quite reasonable but it is still seldom
attended. To eliminate the deviation, we propose two Rectified Euler k-means
methods, i.e., REK1 and REK2, which retain the merits of EulerK while acquire
real centroids residing on the mapped space to better characterize the data
structures. Specifically, REK1 rectifies EulerK by imposing the constraint on
the centroids while REK2 views each centroid as the mapped image from a
pre-image in the original space and optimizes these pre-images in Euler kernel
induced space. Undoubtedly, our proposed REKs can methodologically be extended
to solve problems of such a category. Finally, the experiments validate the
effectiveness of REK1 and REK2.
Related papers
- An Approach Towards Learning K-means-friendly Deep Latent Representation [0.6798775532273751]
Clustering is a long-standing problem area in data mining.
With the advent of deep neural networks, a common approach to this problem is to map the data to some latent space of comparatively lower dimensions.
A well-known centroid-based clustering algorithm is K-means.
arXiv Detail & Related papers (2024-11-29T06:28:38Z) - Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - KPZ scaling from the Krylov space [83.88591755871734]
Recently, a superdiffusion exhibiting the Kardar-Parisi-Zhang scaling in late-time correlators and autocorrelators has been reported.
Inspired by these results, we explore the KPZ scaling in correlation functions using their realization in the Krylov operator basis.
arXiv Detail & Related papers (2024-06-04T20:57:59Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - No More Than 6ft Apart: Robust K-Means via Radius Upper Bounds [17.226362076527764]
Centroid based clustering methods such as k-means, k-medoids and k-centers are heavily applied as a go-to tool in exploratory data analysis.
In many cases, those methods are used to obtain representative centroids of the data manifold for visualization or summarization of a dataset.
We propose to remedy such a scenario by introducing a maximal radius constraint $r$ on the clusters formed by centroids.
arXiv Detail & Related papers (2022-03-04T18:59:02Z) - Accelerating Spherical k-Means [1.7188280334580197]
Spherical k-means is a widely used clustering algorithm for sparse and high-dimensional data such as document vectors.
Many acceleration techniques, such as the algorithms of Elkan and Hamerly, rely on the triangle inequality of Euclidean distances.
In this paper, we incorporate the Elkan and Hamerly accelerations to the spherical k-means algorithm working directly with the Cosines instead of Euclidean distances.
arXiv Detail & Related papers (2021-07-08T19:24:09Z) - Adaptive Explicit Kernel Minkowski Weighted K-means [1.3535770763481905]
The kernel K-means, which extends K-means into the kernel space, is able to capture nonlinear structures and identify arbitrarily shaped clusters.
This paper proposes a method to combine the advantages of the linear and nonlinear approaches by using driven corresponding approximate finite-dimensional feature maps.
arXiv Detail & Related papers (2020-12-04T19:14:09Z) - (k, l)-Medians Clustering of Trajectories Using Continuous Dynamic Time
Warping [57.316437798033974]
In this work we consider the problem of center-based clustering of trajectories.
We propose the usage of a continuous version of DTW as distance measure, which we call continuous dynamic time warping (CDTW)
We show a practical way to compute a center from a set of trajectories and subsequently iteratively improve it.
arXiv Detail & Related papers (2020-12-01T13:17:27Z) - Debiased Sinkhorn barycenters [110.79706180350507]
Entropy regularization in optimal transport (OT) has been the driver of many recent interests for Wasserstein metrics and barycenters in machine learning.
We show how this bias is tightly linked to the reference measure that defines the entropy regularizer.
We propose debiased Wasserstein barycenters that preserve the best of both worlds: fast Sinkhorn-like iterations without entropy smoothing.
arXiv Detail & Related papers (2020-06-03T23:06:02Z) - Ball k-means [53.89505717006118]
The Ball k-means algorithm uses a ball to describe a cluster, focusing on reducing the point-centroid distance computation.
The fast speed, no extra parameters and simple design of the Ball k-means make it an all-around replacement of the naive k-means algorithm.
arXiv Detail & Related papers (2020-05-02T10:39:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.