COHORTNEY: Deep Clustering for Heterogeneous Event Sequences
- URL: http://arxiv.org/abs/2104.01440v1
- Date: Sat, 3 Apr 2021 16:12:21 GMT
- Title: COHORTNEY: Deep Clustering for Heterogeneous Event Sequences
- Authors: Vladislav Zhuzhel, Rodrigo Rivera-Castro, Nina Kaploukhaya, Liliya
Mironova, Alexey Zaytsev, Evgeny Burnaev
- Abstract summary: Clustering of event sequences is widely applicable in domains such as healthcare, marketing, and finance.
We propose COHORTNEY as a novel deep learning method for clustering heterogeneous event sequences.
Our results show that COHORTNEY vastly outperforms in speed and cluster quality the state-of-the-art algorithm for clustering event sequences.
- Score: 9.811178291117496
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There is emerging attention towards working with event sequences. In
particular, clustering of event sequences is widely applicable in domains such
as healthcare, marketing, and finance. Use cases include analysis of visitors
to websites, hospitals, or bank transactions. Unlike traditional time series,
event sequences tend to be sparse and not equally spaced in time. As a result,
they exhibit different properties, which are essential to account for when
developing state-of-the-art methods.
The community has paid little attention to the specifics of heterogeneous
event sequences. Existing research in clustering primarily focuses on classic
times series data. It is unclear if proposed methods in the literature
generalize well to event sequences.
Here we propose COHORTNEY as a novel deep learning method for clustering
heterogeneous event sequences. Our contributions include (i) a novel method
using a combination of LSTM and the EM algorithm and code implementation; (ii)
a comparison of this method to previous research on time series and event
sequence clustering; (iii) a performance benchmark of different approaches on a
new dataset from the finance industry and fourteen additional datasets. Our
results show that COHORTNEY vastly outperforms in speed and cluster quality the
state-of-the-art algorithm for clustering event sequences.
Related papers
- Dying Clusters Is All You Need -- Deep Clustering With an Unknown Number of Clusters [5.507296054825372]
Finding meaningful groups in high-dimensional data is an important challenge in data mining.
Deep clustering methods have achieved remarkable results in these tasks.
Most of these methods require the user to specify the number of clusters in advance.
This is a major limitation since the number of clusters is typically unknown if labeled data is unavailable.
Most of these approaches estimate the number of clusters separated from the clustering process.
arXiv Detail & Related papers (2024-10-12T11:04:10Z) - Clustering of timed sequences -- Application to the analysis of care pathways [0.0]
Revealing typical care pathways can be achieved through clustering.
The difficulty in clustering care pathways, represented by sequences of timestamped events, lies in defining a semantically appropriate metric and clustering algorithms.
arXiv Detail & Related papers (2024-04-23T07:16:13Z) - Time Series Clustering With Random Convolutional Kernels [0.0]
Time series data, spanning applications ranging from climatology to finance to healthcare, presents significant challenges in data mining.
One open issue lies in time series clustering, which is crucial for processing large volumes of unlabeled time series data.
We introduce R-Clustering, a novel method that utilizes convolutional architectures with randomly selected parameters.
arXiv Detail & Related papers (2023-05-17T06:25:22Z) - Robust Detection of Lead-Lag Relationships in Lagged Multi-Factor Models [61.10851158749843]
Key insights can be obtained by discovering lead-lag relationships inherent in the data.
We develop a clustering-driven methodology for robust detection of lead-lag relationships in lagged multi-factor models.
arXiv Detail & Related papers (2023-05-11T10:30:35Z) - Fuzzy clustering of ordinal time series based on two novel distances
with economic applications [0.12891210250935145]
Two novel distances between ordinal time series are introduced and used to construct fuzzy clustering procedures.
The resulting clustering algorithms are computationally efficient and able to group series generated from similar processes.
Two specific applications involving economic time series illustrate the usefulness of the proposed approaches.
arXiv Detail & Related papers (2023-04-24T16:39:22Z) - Towards Out-of-Distribution Sequential Event Prediction: A Causal
Treatment [72.50906475214457]
The goal of sequential event prediction is to estimate the next event based on a sequence of historical events.
In practice, the next-event prediction models are trained with sequential data collected at one time.
We propose a framework with hierarchical branching structures for learning context-specific representations.
arXiv Detail & Related papers (2022-10-24T07:54:13Z) - Summary Markov Models for Event Sequences [23.777457032885813]
We propose a family of models for sequences of different types of events without meaningful time stamps.
The probability of observing an event type depends only on a summary of historical occurrences of its influencing set of event types.
We show that a unique minimal influencing set exists for any set of event types of interest and choice of summary function.
arXiv Detail & Related papers (2022-05-06T17:16:24Z) - Semi-supervised New Event Type Induction and Description via Contrastive
Loss-Enforced Batch Attention [56.46649994444616]
We present a novel approach to semi-supervised new event type induction using a masked contrastive loss.
We extend our approach to two new tasks: predicting the type name of the discovered clusters and linking them to FrameNet frames.
arXiv Detail & Related papers (2022-02-12T00:32:22Z) - Cluster-and-Conquer: A Framework For Time-Series Forecasting [94.63501563413725]
We propose a three-stage framework for forecasting high-dimensional time-series data.
Our framework is highly general, allowing for any time-series forecasting and clustering method to be used in each step.
When instantiated with simple linear autoregressive models, we are able to achieve state-of-the-art results on several benchmark datasets.
arXiv Detail & Related papers (2021-10-26T20:41:19Z) - Multi-Scale One-Class Recurrent Neural Networks for Discrete Event
Sequence Anomaly Detection [63.825781848587376]
We propose OC4Seq, a one-class recurrent neural network for detecting anomalies in discrete event sequences.
Specifically, OC4Seq embeds the discrete event sequences into latent spaces, where anomalies can be easily detected.
arXiv Detail & Related papers (2020-08-31T04:48:22Z) - Clustering Binary Data by Application of Combinatorial Optimization
Heuristics [52.77024349608834]
We study clustering methods for binary data, first defining aggregation criteria that measure the compactness of clusters.
Five new and original methods are introduced, using neighborhoods and population behavior optimization metaheuristics.
From a set of 16 data tables generated by a quasi-Monte Carlo experiment, a comparison is performed for one of the aggregations using L1 dissimilarity, with hierarchical clustering, and a version of k-means: partitioning around medoids or PAM.
arXiv Detail & Related papers (2020-01-06T23:33:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.