Fast dynamic time warping and clustering in C++
- URL: http://arxiv.org/abs/2307.04904v1
- Date: Mon, 10 Jul 2023 21:08:27 GMT
- Title: Fast dynamic time warping and clustering in C++
- Authors: Volkan Kumtepeli and Rebecca Perriment and David A. Howey
- Abstract summary: We present an approach for computationally efficient dynamic time warping (DTW) and clustering of time-series data.
The method frames the dynamic warping of time series datasets as an optimisation problem solved using dynamic programming.
There is also an option to use k-medoids clustering for increased speed, when a certificate for global optimality is not essential.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present an approach for computationally efficient dynamic time warping
(DTW) and clustering of time-series data. The method frames the dynamic warping
of time series datasets as an optimisation problem solved using dynamic
programming, and then clusters time series data by solving a second
optimisation problem using mixed-integer programming (MIP). There is also an
option to use k-medoids clustering for increased speed, when a certificate for
global optimality is not essential. The improved efficiency of our approach is
due to task-level parallelisation of the clustering alongside DTW. Our approach
was tested using the UCR Time Series Archive, and was found to be, on average,
33% faster than the next fastest option when using the same clustering method.
This increases to 64% faster when considering only larger datasets (with more
than 1000 time series). The MIP clustering is most effective on small numbers
of longer time series, because the DTW computation is faster than other
approaches, but the clustering problem becomes increasingly computationally
expensive as the number of time series to be clustered increases.
Related papers
- Determining the Optimal Number of Clusters for Time Series Datasets with
Symbolic Pattern Forest [0.0]
The problem of calculating the optimal number of clusters (say k) is one of the significant challenges for such methods.
In this work, we extended the Symbolic Pattern Forest algorithm to determine the optimal number of clusters for the time series datasets.
We tested our approach on the UCR archive datasets, and our experimental results so far showed significant improvement over the baseline.
arXiv Detail & Related papers (2023-10-01T23:33:37Z) - Dependent Cluster Mapping (DCMAP): Optimal clustering of directed
acyclic graphs for statistical inference [0.0]
A Directed Acyclic Graph (DAG) can be partitioned or mapped into clusters to support inference.
We propose a novel algorithm called DCMAP for optimal cluster mapping with dependent clusters.
For a 25 and 50-node DBN, the search space size was $9.91times 109$ and $1.51times1021$ possible cluster mappings, and the first optimal solution was found at 934 $(text95% CI 926,971)$.
arXiv Detail & Related papers (2023-08-08T01:01:37Z) - Hard Regularization to Prevent Deep Online Clustering Collapse without
Data Augmentation [65.268245109828]
Online deep clustering refers to the joint use of a feature extraction network and a clustering model to assign cluster labels to each new data point or batch as it is processed.
While faster and more versatile than offline methods, online clustering can easily reach the collapsed solution where the encoder maps all inputs to the same point and all are put into a single cluster.
We propose a method that does not require data augmentation, and that, differently from existing methods, regularizes the hard assignments.
arXiv Detail & Related papers (2023-03-29T08:23:26Z) - Adaptively-weighted Integral Space for Fast Multiview Clustering [54.177846260063966]
We propose an Adaptively-weighted Integral Space for Fast Multiview Clustering (AIMC) with nearly linear complexity.
Specifically, view generation models are designed to reconstruct the view observations from the latent integral space.
Experiments conducted on several realworld datasets confirm the superiority of the proposed AIMC method.
arXiv Detail & Related papers (2022-08-25T05:47:39Z) - Late Fusion Multi-view Clustering via Global and Local Alignment
Maximization [61.89218392703043]
Multi-view clustering (MVC) optimally integrates complementary information from different views to improve clustering performance.
Most of existing approaches directly fuse multiple pre-specified similarities to learn an optimal similarity matrix for clustering.
We propose late fusion MVC via alignment to address these issues.
arXiv Detail & Related papers (2022-08-02T01:49:31Z) - Cluster-and-Conquer: A Framework For Time-Series Forecasting [94.63501563413725]
We propose a three-stage framework for forecasting high-dimensional time-series data.
Our framework is highly general, allowing for any time-series forecasting and clustering method to be used in each step.
When instantiated with simple linear autoregressive models, we are able to achieve state-of-the-art results on several benchmark datasets.
arXiv Detail & Related papers (2021-10-26T20:41:19Z) - SOMTimeS: Self Organizing Maps for Time Series Clustering and its
Application to Serious Illness Conversations [3.2689702143620147]
We present a new DTW-based clustering method called SOMTimeS (a Self-Organizing Map for TIME Series)
It scales better and runs faster than other DTW-based clustering algorithms, and has similar performance accuracy.
We applied SOMtimeS to natural language conversation data collected as part of a large healthcare cohort study.
arXiv Detail & Related papers (2021-08-26T00:18:25Z) - A New Parallel Adaptive Clustering and its Application to Streaming Data [0.0]
This paper presents a parallel adaptive clustering (PAC) algorithm to automatically classify data while simultaneously choosing a suitable number of classes.
We develop regularized set mik-means to efficiently cluster the results from the parallel threads.
We provide theoretical analysis and numerical experiments to characterize the performance of the method.
arXiv Detail & Related papers (2021-04-06T17:18:56Z) - Gradient Coding with Dynamic Clustering for Straggler-Tolerant
Distributed Learning [55.052517095437]
gradient descent (GD) is widely employed to parallelize the learning task by distributing the dataset across multiple workers.
A significant performance bottleneck for the per-iteration completion time in distributed synchronous GD is $straggling$ workers.
Coded distributed techniques have been introduced recently to mitigate stragglers and to speed up GD iterations by assigning redundant computations to workers.
We propose a novel dynamic GC scheme, which assigns redundant data to workers to acquire the flexibility to choose from among a set of possible codes depending on the past straggling behavior.
arXiv Detail & Related papers (2021-03-01T18:51:29Z) - Hierarchical Clustering using Auto-encoded Compact Representation for
Time-series Analysis [8.660029077292346]
We propose a novel mechanism to identify the clusters combining learned compact representation of time-series, Auto Encoded Compact Sequence (AECS) and hierarchical clustering approach.
Our algorithm exploits Recurrent Neural Network (RNN) based under complete Sequence to Sequence(seq2seq) autoencoder and agglomerative hierarchical clustering.
arXiv Detail & Related papers (2021-01-11T08:03:57Z) - (k, l)-Medians Clustering of Trajectories Using Continuous Dynamic Time
Warping [57.316437798033974]
In this work we consider the problem of center-based clustering of trajectories.
We propose the usage of a continuous version of DTW as distance measure, which we call continuous dynamic time warping (CDTW)
We show a practical way to compute a center from a set of trajectories and subsequently iteratively improve it.
arXiv Detail & Related papers (2020-12-01T13:17:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.