On time series clustering with k-means
        - URL: http://arxiv.org/abs/2410.14269v1
- Date: Fri, 18 Oct 2024 08:24:07 GMT
- Title: On time series clustering with k-means
- Authors: Christopher Holder, Anthony Bagnall, Jason Lines, 
- Abstract summary: Time series clustering algorithms are often presented with k-means configured in various ways.
This variability makes it difficult to compare studies because k-means is known to be highly sensitive to its configuration.
We propose a standard Lloyd's-based model for TSCL that adopts an end-to-end approach.
- Score: 0.5530212768657544
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract:   There is a long history of research into time series clustering using distance-based partitional clustering. Many of the most popular algorithms adapt k-means (also known as Lloyd's algorithm) to exploit time dependencies in the data by specifying a time series distance function. However, these algorithms are often presented with k-means configured in various ways, altering key parameters such as the initialisation strategy. This variability makes it difficult to compare studies because k-means is known to be highly sensitive to its configuration. To address this, we propose a standard Lloyd's-based model for TSCL that adopts an end-to-end approach, incorporating a specialised distance function not only in the assignment step but also in the initialisation and stopping criteria. By doing so, we create a unified structure for comparing seven popular Lloyd's-based TSCL algorithms. This common framework enables us to more easily attribute differences in clustering performance to the distance function itself, rather than variations in the k-means configuration. 
 
      
        Related papers
        - A system identification approach to clustering vector autoregressive   time series [50.66782357329375]
 Clustering time series based on their underlying dynamics is keeping attracting researchers due to its impacts on assisting complex system modelling.<n>Most current time series clustering methods handle only scalar time series, treat them as white noise, or rely on domain knowledge for high-quality feature construction.<n>Instead of relying on feature/metric construction, the system identification approach allows treating vector time series clustering by explicitly considering their underlying autoregressive dynamics.
 arXiv  Detail & Related papers  (2025-05-20T14:31:44Z)
- K*-Means: A Parameter-free Clustering Algorithm [55.20132267309382]
 k*-means is a novel clustering algorithm that eliminates the need to set k or any other parameters.<n>It uses the minimum description length principle to automatically determine the optimal number of clusters, k*, by splitting and merging clusters.<n>We prove that k*-means is guaranteed to converge and demonstrate experimentally that it significantly outperforms existing methods in scenarios where k is unknown.
 arXiv  Detail & Related papers  (2025-05-17T08:41:07Z)
- Clustering of timed sequences -- Application to the analysis of care   pathways [0.0]
 Revealing typical care pathways can be achieved through clustering.
The difficulty in clustering care pathways, represented by sequences of timestamped events, lies in defining a semantically appropriate metric and clustering algorithms.
 arXiv  Detail & Related papers  (2024-04-23T07:16:13Z)
- Automated regime detection in multidimensional time series data using
  sliced Wasserstein k-means clustering [0.0]
 We study the behaviour of the Wasserstein k-means clustering algorithm applied to time series data.
We extend the technique to multidimensional time series data by approximating the multidimensional Wasserstein distance as a sliced Wasserstein distance.
We show that the sWk-means method is effective in identifying distinct market regimes in real multidimensional financial time series.
 arXiv  Detail & Related papers  (2023-10-02T15:37:56Z)
- Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
 We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
 arXiv  Detail & Related papers  (2023-06-18T08:46:06Z)
- Rethinking k-means from manifold learning perspective [122.38667613245151]
 We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
 arXiv  Detail & Related papers  (2023-05-12T03:01:41Z)
- Efficient Approximate Kernel Based Spike Sequence Classification [56.2938724367661]
 Machine learning models, such as SVM, require a definition of distance/similarity between pairs of sequences.
Exact methods yield better classification performance, but they pose high computational costs.
We propose a series of ways to improve the performance of the approximate kernel in order to enhance its predictive performance.
 arXiv  Detail & Related papers  (2022-09-11T22:44:19Z)
- k-MS: A novel clustering algorithm based on morphological reconstruction [0.0]
 k-MS is faster than the CPU-parallel k-Means in worst case scenarios.
It is also faster than similar clusterization methods that are sensitive to density and shapes such as Mitosis and TRICLUST.
 arXiv  Detail & Related papers  (2022-08-30T16:55:21Z)
- Cluster-and-Conquer: A Framework For Time-Series Forecasting [94.63501563413725]
 We propose a three-stage framework for forecasting high-dimensional time-series data.
Our framework is highly general, allowing for any time-series forecasting and clustering method to be used in each step.
When instantiated with simple linear autoregressive models, we are able to achieve state-of-the-art results on several benchmark datasets.
 arXiv  Detail & Related papers  (2021-10-26T20:41:19Z)
- Robust Trimmed k-means [70.88503833248159]
 We propose Robust Trimmed k-means (RTKM) that simultaneously identifies outliers and clusters points.
We show RTKM performs competitively with other methods on single membership data with outliers and multi-membership data without outliers.
 arXiv  Detail & Related papers  (2021-08-16T15:49:40Z)
- ThetA -- fast and robust clustering via a distance parameter [3.0020405188885815]
 Clustering is a fundamental problem in machine learning where distance-based approaches have dominated the field for many decades.
We propose a new set of distance threshold methods called Theta-based Algorithms (ThetA)
 arXiv  Detail & Related papers  (2021-02-13T23:16:33Z)
- Hierarchical Clustering using Auto-encoded Compact Representation for
  Time-series Analysis [8.660029077292346]
 We propose a novel mechanism to identify the clusters combining learned compact representation of time-series, Auto Encoded Compact Sequence (AECS) and hierarchical clustering approach.
Our algorithm exploits Recurrent Neural Network (RNN) based under complete Sequence to Sequence(seq2seq) autoencoder and agglomerative hierarchical clustering.
 arXiv  Detail & Related papers  (2021-01-11T08:03:57Z)
- Differentially Private Clustering: Tight Approximation Ratios [57.89473217052714]
 We give efficient differentially private algorithms for basic clustering problems.
Our results imply an improved algorithm for the Sample and Aggregate privacy framework.
One of the tools used in our 1-Cluster algorithm can be employed to get a faster quantum algorithm for ClosestPair in a moderate number of dimensions.
 arXiv  Detail & Related papers  (2020-08-18T16:22:06Z)
- Clustering Binary Data by Application of Combinatorial Optimization
  Heuristics [52.77024349608834]
 We study clustering methods for binary data, first defining aggregation criteria that measure the compactness of clusters.
Five new and original methods are introduced, using neighborhoods and population behavior optimization metaheuristics.
From a set of 16 data tables generated by a quasi-Monte Carlo experiment, a comparison is performed for one of the aggregations using L1 dissimilarity, with hierarchical clustering, and a version of k-means: partitioning around medoids or PAM.
 arXiv  Detail & Related papers  (2020-01-06T23:33:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.