Clustering Time Series Data through Autoencoder-based Deep Learning
Models
- URL: http://arxiv.org/abs/2004.07296v1
- Date: Sat, 11 Apr 2020 18:51:13 GMT
- Title: Clustering Time Series Data through Autoencoder-based Deep Learning
Models
- Authors: Neda Tavakoli, Sima Siami-Namini, Mahdi Adl Khanghah, Fahimeh Mirza
Soltani, Akbar Siami Namin
- Abstract summary: This paper introduces a two-stage method for clustering time series data.
First, a technique is introduced to utilize the characteristics of given time series data in order to create labels.
Second, an autoencoder-based deep learning model is built to learn and model both known and hidden features of time series data.
- Score: 1.0499611180329802
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning and in particular deep learning algorithms are the emerging
approaches to data analysis. These techniques have transformed traditional data
mining-based analysis radically into a learning-based model in which existing
data sets along with their cluster labels (i.e., train set) are learned to
build a supervised learning model and predict the cluster labels of unseen data
(i.e., test set). In particular, deep learning techniques are capable of
capturing and learning hidden features in a given data sets and thus building a
more accurate prediction model for clustering and labeling problem. However,
the major problem is that time series data are often unlabeled and thus
supervised learning-based deep learning algorithms cannot be directly adapted
to solve the clustering problems for these special and complex types of data
sets. To address this problem, this paper introduces a two-stage method for
clustering time series data. First, a novel technique is introduced to utilize
the characteristics (e.g., volatility) of given time series data in order to
create labels and thus be able to transform the problem from unsupervised
learning into supervised learning. Second, an autoencoder-based deep learning
model is built to learn and model both known and hidden features of time series
data along with their created labels to predict the labels of unseen time
series data. The paper reports a case study in which financial and stock time
series data of selected 70 stock indices are clustered into distinct groups
using the introduced two-stage procedure. The results show that the proposed
procedure is capable of achieving 87.5\% accuracy in clustering and predicting
the labels for unseen time series data.
Related papers
- An End-to-End Model for Time Series Classification In the Presence of Missing Values [25.129396459385873]
Time series classification with missing data is a prevalent issue in time series analysis.
This study proposes an end-to-end neural network that unifies data imputation and representation learning within a single framework.
arXiv Detail & Related papers (2024-08-11T19:39:12Z) - Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach [36.47860223750303]
We consider the problem of automatic curation of high-quality datasets for self-supervised pre-training.
We propose a clustering-based approach for building ones satisfying all these criteria.
Our method involves successive and hierarchical applications of $k$-means on a large and diverse data repository.
arXiv Detail & Related papers (2024-05-24T14:58:51Z) - Pushing the Limits of Pre-training for Time Series Forecasting in the
CloudOps Domain [54.67888148566323]
We introduce three large-scale time series forecasting datasets from the cloud operations domain.
We show it is a strong zero-shot baseline and benefits from further scaling, both in model and dataset size.
Accompanying these datasets and results is a suite of comprehensive benchmark results comparing classical and deep learning baselines to our pre-trained method.
arXiv Detail & Related papers (2023-10-08T08:09:51Z) - Hard Regularization to Prevent Deep Online Clustering Collapse without
Data Augmentation [65.268245109828]
Online deep clustering refers to the joint use of a feature extraction network and a clustering model to assign cluster labels to each new data point or batch as it is processed.
While faster and more versatile than offline methods, online clustering can easily reach the collapsed solution where the encoder maps all inputs to the same point and all are put into a single cluster.
We propose a method that does not require data augmentation, and that, differently from existing methods, regularizes the hard assignments.
arXiv Detail & Related papers (2023-03-29T08:23:26Z) - Time Series Contrastive Learning with Information-Aware Augmentations [57.45139904366001]
A key component of contrastive learning is to select appropriate augmentations imposing some priors to construct feasible positive samples.
How to find the desired augmentations of time series data that are meaningful for given contrastive learning tasks and datasets remains an open question.
We propose a new contrastive learning approach with information-aware augmentations, InfoTS, that adaptively selects optimal augmentations for time series representation learning.
arXiv Detail & Related papers (2023-03-21T15:02:50Z) - Pseudo-Labeled Auto-Curriculum Learning for Semi-Supervised Keypoint
Localization [88.74813798138466]
Localizing keypoints of an object is a basic visual problem.
Supervised learning of a keypoint localization network often requires a large amount of data.
We propose to automatically select reliable pseudo-labeled samples with a series of dynamic thresholds.
arXiv Detail & Related papers (2022-01-21T09:51:58Z) - Novel Features for Time Series Analysis: A Complex Networks Approach [62.997667081978825]
Time series data are ubiquitous in several domains as climate, economics and health care.
Recent conceptual approach relies on time series mapping to complex networks.
Network analysis can be used to characterize different types of time series.
arXiv Detail & Related papers (2021-10-11T13:46:28Z) - Deep Time Series Models for Scarce Data [8.673181404172963]
Time series data have grown at an explosive rate in numerous domains and have stimulated a surge of time series modeling research.
Data scarcity is a universal issue that occurs in a vast range of data analytics problems.
arXiv Detail & Related papers (2021-03-16T22:16:54Z) - Diverse Complexity Measures for Dataset Curation in Self-driving [80.55417232642124]
We propose a new data selection method that exploits a diverse set of criteria that quantize interestingness of traffic scenes.
Our experiments show that the proposed curation pipeline is able to select datasets that lead to better generalization and higher performance.
arXiv Detail & Related papers (2021-01-16T23:45:02Z) - Deep learning for time series classification [2.0305676256390934]
Time series analysis allows us to visualize and understand the evolution of a process over time.
Time series classification consists of constructing algorithms dedicated to automatically label time series data.
Deep learning has emerged as one of the most effective methods for tackling the supervised classification task.
arXiv Detail & Related papers (2020-10-01T17:38:40Z) - Autoencoder-based time series clustering with energy applications [0.0]
Time series clustering is a challenging task due to the specific nature of the data.
In this paper we investigate the combination of a convolutional autoencoder and a k-medoids algorithm to perfom time series clustering.
arXiv Detail & Related papers (2020-02-10T10:04:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.