K-ARMA Models for Clustering Time Series Data
- URL: http://arxiv.org/abs/2207.00039v1
- Date: Thu, 30 Jun 2022 18:16:11 GMT
- Title: K-ARMA Models for Clustering Time Series Data
- Authors: Derek O. Hoare, David S. Matteson, and Martin T. Wells
- Abstract summary: We present an approach to clustering time series data using a model-based generalization of the K-Means algorithm.
We show how the clustering algorithm can be made robust to outliers using a least-absolute deviations criteria.
We perform experiments on real data which show that our method is competitive with other existing methods for similar time series clustering tasks.
- Score: 4.345882429229813
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present an approach to clustering time series data using a model-based
generalization of the K-Means algorithm which we call K-Models. We prove the
convergence of this general algorithm and relate it to the hard-EM algorithm
for mixture modeling. We then apply our method first with an AR($p$) clustering
example and show how the clustering algorithm can be made robust to outliers
using a least-absolute deviations criteria. We then build our clustering
algorithm up for ARMA($p,q$) models and extend this to ARIMA($p,d,q$) models.
We develop a goodness of fit statistic for the models fitted to clusters based
on the Ljung-Box statistic. We perform experiments with simulated data to show
how the algorithm can be used for outlier detection, detecting distributional
drift, and discuss the impact of initialization method on empty clusters. We
also perform experiments on real data which show that our method is competitive
with other existing methods for similar time series clustering tasks.
Related papers
- A simulation study of cluster search algorithms in data set generated by Gaussian mixture models [0.0]
This study examines centroid- and model-based cluster search algorithms in various cases that Gaussian mixture models (GMMs) can generate.
The results show that some cluster-splitting criteria based on Euclidean distance make unreasonable decisions when clusters overlap.
arXiv Detail & Related papers (2024-07-27T07:47:25Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - clusterBMA: Bayesian model averaging for clustering [1.2021605201770345]
We introduce clusterBMA, a method that enables weighted model averaging across results from unsupervised clustering algorithms.
We use clustering internal validation criteria to develop an approximation of the posterior model probability, used for weighting the results from each model.
In addition to outperforming other ensemble clustering methods on simulated data, clusterBMA offers unique features including probabilistic allocation to averaged clusters.
arXiv Detail & Related papers (2022-09-09T04:55:20Z) - Time Series Clustering with an EM algorithm for Mixtures of Linear
Gaussian State Space Models [0.0]
We propose a novel model-based time series clustering method with mixtures of linear Gaussian state space models.
The proposed method uses a new expectation-maximization algorithm for the mixture model to estimate the model parameters.
Experiments on a simulated dataset demonstrate the effectiveness of the method in clustering, parameter estimation, and model selection.
arXiv Detail & Related papers (2022-08-25T07:41:23Z) - Personalized Federated Learning via Convex Clustering [72.15857783681658]
We propose a family of algorithms for personalized federated learning with locally convex user costs.
The proposed framework is based on a generalization of convex clustering in which the differences between different users' models are penalized.
arXiv Detail & Related papers (2022-02-01T19:25:31Z) - Cluster-and-Conquer: A Framework For Time-Series Forecasting [94.63501563413725]
We propose a three-stage framework for forecasting high-dimensional time-series data.
Our framework is highly general, allowing for any time-series forecasting and clustering method to be used in each step.
When instantiated with simple linear autoregressive models, we are able to achieve state-of-the-art results on several benchmark datasets.
arXiv Detail & Related papers (2021-10-26T20:41:19Z) - Kernel learning approaches for summarising and combining posterior
similarity matrices [68.8204255655161]
We build upon the notion of the posterior similarity matrix (PSM) in order to suggest new approaches for summarising the output of MCMC algorithms for Bayesian clustering models.
A key contribution of our work is the observation that PSMs are positive semi-definite, and hence can be used to define probabilistically-motivated kernel matrices.
arXiv Detail & Related papers (2020-09-27T14:16:14Z) - Autoencoder-based time series clustering with energy applications [0.0]
Time series clustering is a challenging task due to the specific nature of the data.
In this paper we investigate the combination of a convolutional autoencoder and a k-medoids algorithm to perfom time series clustering.
arXiv Detail & Related papers (2020-02-10T10:04:29Z) - CONSAC: Robust Multi-Model Fitting by Conditional Sample Consensus [62.86856923633923]
We present a robust estimator for fitting multiple parametric models of the same form to noisy measurements.
In contrast to previous works, which resorted to hand-crafted search strategies for multiple model detection, we learn the search strategy from data.
For self-supervised learning of the search, we evaluate the proposed algorithm on multi-homography estimation and demonstrate an accuracy that is superior to state-of-the-art methods.
arXiv Detail & Related papers (2020-01-08T17:37:01Z) - Clustering Binary Data by Application of Combinatorial Optimization
Heuristics [52.77024349608834]
We study clustering methods for binary data, first defining aggregation criteria that measure the compactness of clusters.
Five new and original methods are introduced, using neighborhoods and population behavior optimization metaheuristics.
From a set of 16 data tables generated by a quasi-Monte Carlo experiment, a comparison is performed for one of the aggregations using L1 dissimilarity, with hierarchical clustering, and a version of k-means: partitioning around medoids or PAM.
arXiv Detail & Related papers (2020-01-06T23:33:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.