A Review and Evaluation of Elastic Distance Functions for Time Series
Clustering
- URL: http://arxiv.org/abs/2205.15181v2
- Date: Wed, 26 Apr 2023 08:17:18 GMT
- Title: A Review and Evaluation of Elastic Distance Functions for Time Series
Clustering
- Authors: Chris Holder, Matthew Middlehurst and Anthony Bagnall
- Abstract summary: We describe nine commonly used elastic distance measures and compare their performance with k-means and k-medoids clustering.
The most popular technique, dynamic time warping (DTW), performs worse than Euclidean distance with k-means, and even when tuned, is no better.
Our conclusion is to recommend MSM with k-medoids as the benchmark algorithm for clustering time series with elastic distance measures.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Time series clustering is the act of grouping time series data without
recourse to a label. Algorithms that cluster time series can be classified into
two groups: those that employ a time series specific distance measure; and
those that derive features from time series. Both approaches usually rely on
traditional clustering algorithms such as $k$-means. Our focus is on distance
based time series that employ elastic distance measures, i.e. distances that
perform some kind of realignment whilst measuring distance. We describe nine
commonly used elastic distance measures and compare their performance with
k-means and k-medoids clustering. Our findings are surprising. The most popular
technique, dynamic time warping (DTW), performs worse than Euclidean distance
with k-means, and even when tuned, is no better. Using k-medoids rather than
k-means improved the clusterings for all nine distance measures. DTW is not
significantly better than Euclidean distance with k-medoids. Generally,
distance measures that employ editing in conjunction with warping perform
better, and one distance measure, the move-split-merge (MSM) method, is the
best performing measure of this study. We also compare to clustering with DTW
using barycentre averaging (DBA). We find that DBA does improve DTW k-means,
but that the standard DBA is still worse than using MSM. Our conclusion is to
recommend MSM with k-medoids as the benchmark algorithm for clustering time
series with elastic distance measures. We provide implementations in the aeon
toolkit, results and guidance on reproducing results on the associated GitHub
repository.
Related papers
- On time series clustering with k-means [0.5530212768657544]
Time series clustering algorithms are often presented with k-means configured in various ways.
This variability makes it difficult to compare studies because k-means is known to be highly sensitive to its configuration.
We propose a standard Lloyd's-based model for TSCL that adopts an end-to-end approach.
arXiv Detail & Related papers (2024-10-18T08:24:07Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - Kernel distance measures for time series, random fields and other
structured data [71.61147615789537]
kdiff is a novel kernel-based measure for estimating distances between instances of structured data.
It accounts for both self and cross similarities across the instances and is defined using a lower quantile of the distance distribution.
Some theoretical results are provided for separability conditions using kdiff as a distance measure for clustering and classification problems.
arXiv Detail & Related papers (2021-09-29T22:54:17Z) - SOMTimeS: Self Organizing Maps for Time Series Clustering and its
Application to Serious Illness Conversations [3.2689702143620147]
We present a new DTW-based clustering method called SOMTimeS (a Self-Organizing Map for TIME Series)
It scales better and runs faster than other DTW-based clustering algorithms, and has similar performance accuracy.
We applied SOMtimeS to natural language conversation data collected as part of a large healthcare cohort study.
arXiv Detail & Related papers (2021-08-26T00:18:25Z) - Robust Trimmed k-means [70.88503833248159]
We propose Robust Trimmed k-means (RTKM) that simultaneously identifies outliers and clusters points.
We show RTKM performs competitively with other methods on single membership data with outliers and multi-membership data without outliers.
arXiv Detail & Related papers (2021-08-16T15:49:40Z) - Efficient Sparse Spherical k-Means for Document Clustering [13.217173710137363]
We propose an efficient indexing structure to improve the scalability of Spherical k-Means with respect to k.
Our approach exploits the sparsity of the input vectors and the convergence behavior of k-Means to reduce the number of comparisons on each iteration significantly.
arXiv Detail & Related papers (2021-07-30T12:02:33Z) - (k, l)-Medians Clustering of Trajectories Using Continuous Dynamic Time
Warping [57.316437798033974]
In this work we consider the problem of center-based clustering of trajectories.
We propose the usage of a continuous version of DTW as distance measure, which we call continuous dynamic time warping (CDTW)
We show a practical way to compute a center from a set of trajectories and subsequently iteratively improve it.
arXiv Detail & Related papers (2020-12-01T13:17:27Z) - Differentially Private Clustering: Tight Approximation Ratios [57.89473217052714]
We give efficient differentially private algorithms for basic clustering problems.
Our results imply an improved algorithm for the Sample and Aggregate privacy framework.
One of the tools used in our 1-Cluster algorithm can be employed to get a faster quantum algorithm for ClosestPair in a moderate number of dimensions.
arXiv Detail & Related papers (2020-08-18T16:22:06Z) - Fast and Eager k-Medoids Clustering: O(k) Runtime Improvement of the
PAM, CLARA, and CLARANS Algorithms [0.0]
Partitioning Around Medoids (PAM) is an algorithm for clustering non-Euclidean data.
We propose modifications to PAM that achieve an O(k)-fold speedup in the second ("SWAP") phase of the algorithm.
In experiments on real data with k=100,200, we observed a 458x respectively 1191x speedup compared to the original PAM SWAP algorithm.
arXiv Detail & Related papers (2020-08-12T08:37:50Z) - Ball k-means [53.89505717006118]
The Ball k-means algorithm uses a ball to describe a cluster, focusing on reducing the point-centroid distance computation.
The fast speed, no extra parameters and simple design of the Ball k-means make it an all-around replacement of the naive k-means algorithm.
arXiv Detail & Related papers (2020-05-02T10:39:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.