Related papers: A Review and Evaluation of Elastic Distance Functions for Time Series Clustering

A Review and Evaluation of Elastic Distance Functions for Time Series Clustering

URL: http://arxiv.org/abs/2205.15181v2
Date: Wed, 26 Apr 2023 08:17:18 GMT
Title: A Review and Evaluation of Elastic Distance Functions for Time Series Clustering
Authors: Chris Holder, Matthew Middlehurst and Anthony Bagnall
Abstract summary: We describe nine commonly used elastic distance measures and compare their performance with k-means and k-medoids clustering. The most popular technique, dynamic time warping (DTW), performs worse than Euclidean distance with k-means, and even when tuned, is no better. Our conclusion is to recommend MSM with k-medoids as the benchmark algorithm for clustering time series with elastic distance measures.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Time series clustering is the act of grouping time series data without recourse to a label. Algorithms that cluster time series can be classified into two groups: those that employ a time series specific distance measure; and those that derive features from time series. Both approaches usually rely on traditional clustering algorithms such as $k$-means. Our focus is on distance based time series that employ elastic distance measures, i.e. distances that perform some kind of realignment whilst measuring distance. We describe nine commonly used elastic distance measures and compare their performance with k-means and k-medoids clustering. Our findings are surprising. The most popular technique, dynamic time warping (DTW), performs worse than Euclidean distance with k-means, and even when tuned, is no better. Using k-medoids rather than k-means improved the clusterings for all nine distance measures. DTW is not significantly better than Euclidean distance with k-medoids. Generally, distance measures that employ editing in conjunction with warping perform better, and one distance measure, the move-split-merge (MSM) method, is the best performing measure of this study. We also compare to clustering with DTW using barycentre averaging (DBA). We find that DBA does improve DTW k-means, but that the standard DBA is still worse than using MSM. Our conclusion is to recommend MSM with k-medoids as the benchmark algorithm for clustering time series with elastic distance measures. We provide implementations in the aeon toolkit, results and guidance on reproducing results on the associated GitHub repository.

Related papers

Rock the KASBA: Blazingly Fast and Accurate Time Series Clustering [0.6215404942415159]
We present a new time series clustering (TSCL) algorithm, the $k$-means (K) accelerated (A) subgradient (S) Barycentre (B) Average (A) KASBA is a $k$-means clustering algorithm that uses the Move-Split-Merge (MSM) elastic distance at all stages of clustering, applies a randomised subgradient descent to find barycentre centroids, links each stage of clustering to accelerate convergence and exploits the metric property of MSM distance to avoid a large proportion of distance calculations. It is a versatile and scalable clusterer designed for
arXiv Detail & Related papers (2024-11-26T19:26:17Z)
On time series clustering with k-means [0.5530212768657544]
Time series clustering algorithms are often presented with k-means configured in various ways. This variability makes it difficult to compare studies because k-means is known to be highly sensitive to its configuration. We propose a standard Lloyd's-based model for TSCL that adopts an end-to-end approach.
arXiv Detail & Related papers (2024-10-18T08:24:07Z)
Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation. Specifically, we construct distance matrix between data points by Butterworth filter. To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z)
Kernel distance measures for time series, random fields and other structured data [71.61147615789537]
kdiff is a novel kernel-based measure for estimating distances between instances of structured data. It accounts for both self and cross similarities across the instances and is defined using a lower quantile of the distance distribution. Some theoretical results are provided for separability conditions using kdiff as a distance measure for clustering and classification problems.
arXiv Detail & Related papers (2021-09-29T22:54:17Z)
SOMTimeS: Self Organizing Maps for Time Series Clustering and its Application to Serious Illness Conversations [3.2689702143620147]
We present a new DTW-based clustering method called SOMTimeS (a Self-Organizing Map for TIME Series) It scales better and runs faster than other DTW-based clustering algorithms, and has similar performance accuracy. We applied SOMtimeS to natural language conversation data collected as part of a large healthcare cohort study.
arXiv Detail & Related papers (2021-08-26T00:18:25Z)
Robust Trimmed k-means [70.88503833248159]
We propose Robust Trimmed k-means (RTKM) that simultaneously identifies outliers and clusters points. We show RTKM performs competitively with other methods on single membership data with outliers and multi-membership data without outliers.
arXiv Detail & Related papers (2021-08-16T15:49:40Z)
Efficient Sparse Spherical k-Means for Document Clustering [13.217173710137363]
We propose an efficient indexing structure to improve the scalability of Spherical k-Means with respect to k. Our approach exploits the sparsity of the input vectors and the convergence behavior of k-Means to reduce the number of comparisons on each iteration significantly.
arXiv Detail & Related papers (2021-07-30T12:02:33Z)
(k, l)-Medians Clustering of Trajectories Using Continuous Dynamic Time Warping [57.316437798033974]
In this work we consider the problem of center-based clustering of trajectories. We propose the usage of a continuous version of DTW as distance measure, which we call continuous dynamic time warping (CDTW) We show a practical way to compute a center from a set of trajectories and subsequently iteratively improve it.
arXiv Detail & Related papers (2020-12-01T13:17:27Z)
Differentially Private Clustering: Tight Approximation Ratios [57.89473217052714]
We give efficient differentially private algorithms for basic clustering problems. Our results imply an improved algorithm for the Sample and Aggregate privacy framework. One of the tools used in our 1-Cluster algorithm can be employed to get a faster quantum algorithm for ClosestPair in a moderate number of dimensions.
arXiv Detail & Related papers (2020-08-18T16:22:06Z)
Fast and Eager k-Medoids Clustering: O(k) Runtime Improvement of the PAM, CLARA, and CLARANS Algorithms [0.0]
Partitioning Around Medoids (PAM) is an algorithm for clustering non-Euclidean data. We propose modifications to PAM that achieve an O(k)-fold speedup in the second ("SWAP") phase of the algorithm. In experiments on real data with k=100,200, we observed a 458x respectively 1191x speedup compared to the original PAM SWAP algorithm.
arXiv Detail & Related papers (2020-08-12T08:37:50Z)
Ball k-means [53.89505717006118]
The Ball k-means algorithm uses a ball to describe a cluster, focusing on reducing the point-centroid distance computation. The fast speed, no extra parameters and simple design of the Ball k-means make it an all-around replacement of the naive k-means algorithm.
arXiv Detail & Related papers (2020-05-02T10:39:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.