nTreeClus: a Tree-based Sequence Encoder for Clustering Categorical
Series
- URL: http://arxiv.org/abs/2102.10252v1
- Date: Sat, 20 Feb 2021 03:58:17 GMT
- Title: nTreeClus: a Tree-based Sequence Encoder for Clustering Categorical
Series
- Authors: Hadi Jahanshahi and Mustafa Gokce Baydogan
- Abstract summary: This paper proposes a new Model-based approach for clustering sequence data, namely nTreeClus.
Adopting this new representation, we cluster sequences, considering the inherent patterns in categorical time series.
The empirical evaluation using synthetic and real datasets, protein sequences, and categorical time series showed that nTreeClus is competitive or superior to most state-of-the-art algorithms.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The overwhelming presence of categorical/sequential data in diverse domains
emphasizes the importance of sequence mining. The challenging nature of
sequences proves the need for continuing research to find a more accurate and
faster approach providing a better understanding of their (dis)similarities.
This paper proposes a new Model-based approach for clustering sequence data,
namely nTreeClus. The proposed method deploys Tree-based Learners, k-mers, and
autoregressive models for categorical time series, culminating with a novel
numerical representation of the categorical sequences. Adopting this new
representation, we cluster sequences, considering the inherent patterns in
categorical time series. Accordingly, the model showed robustness to its
parameter. Under different simulated scenarios, nTreeClus improved the baseline
methods for various internal and external cluster validation metrics for up to
10.7% and 2.7%, respectively. The empirical evaluation using synthetic and real
datasets, protein sequences, and categorical time series showed that nTreeClus
is competitive or superior to most state-of-the-art algorithms.
Related papers
- Approximate learning of parsimonious Bayesian context trees [0.0]
The proposed framework is tested on synthetic and real-world data examples.
It outperforms existing sequence models when fitted to real protein sequences and honeypot computer terminal sessions.
arXiv Detail & Related papers (2024-07-27T11:50:40Z) - Interpretable Sequence Clustering [3.280979689839737]
We propose a method called Interpretable Sequence Clustering Tree (ISCT)
ISCT generates k leaf nodes, corresponding to k clusters, which provides an intuitive explanation on how each cluster is formed.
Experimental results on 14 real-world data sets demonstrate that our proposed method provides an interpretable tree structure.
arXiv Detail & Related papers (2023-09-03T11:31:44Z) - Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering.
In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework.
In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Hierarchical clustering with dot products recovers hidden tree structure [53.68551192799585]
In this paper we offer a new perspective on the well established agglomerative clustering algorithm, focusing on recovery of hierarchical structure.
We recommend a simple variant of the standard algorithm, in which clusters are merged by maximum average dot product and not, for example, by minimum distance or within-cluster variance.
We demonstrate that the tree output by this algorithm provides a bona fide estimate of generative hierarchical structure in data, under a generic probabilistic graphical model.
arXiv Detail & Related papers (2023-05-24T11:05:12Z) - Seq-HyGAN: Sequence Classification via Hypergraph Attention Network [0.0]
Sequence classification has a wide range of real-world applications in different domains, such as genome classification in health and anomaly detection in business.
The lack of explicit features in sequence data makes it difficult for machine learning models.
We propose a novel Hypergraph Attention Network model, namely Seq-HyGAN.
arXiv Detail & Related papers (2023-03-04T11:53:33Z) - SETAR-Tree: A Novel and Accurate Tree Algorithm for Global Time Series
Forecasting [7.206754802573034]
In this paper, we explore the close connections between TAR models and regression trees.
We introduce a new forecasting-specific tree algorithm that trains global Pooled Regression (PR) models in the leaves.
In our evaluation, the proposed tree and forest models are able to achieve significantly higher accuracy than a set of state-of-the-art tree-based algorithms.
arXiv Detail & Related papers (2022-11-16T04:30:42Z) - Cluster-and-Conquer: A Framework For Time-Series Forecasting [94.63501563413725]
We propose a three-stage framework for forecasting high-dimensional time-series data.
Our framework is highly general, allowing for any time-series forecasting and clustering method to be used in each step.
When instantiated with simple linear autoregressive models, we are able to achieve state-of-the-art results on several benchmark datasets.
arXiv Detail & Related papers (2021-10-26T20:41:19Z) - T-LoHo: A Bayesian Regularization Model for Structured Sparsity and
Smoothness on Graphs [0.0]
In graph-structured data, structured sparsity and smoothness tend to cluster together.
We propose a new prior for high dimensional parameters with graphical relations.
We use it to detect structured sparsity and smoothness simultaneously.
arXiv Detail & Related papers (2021-07-06T10:10:03Z) - Structured Graph Learning for Clustering and Semi-supervised
Classification [74.35376212789132]
We propose a graph learning framework to preserve both the local and global structure of data.
Our method uses the self-expressiveness of samples to capture the global structure and adaptive neighbor approach to respect the local structure.
Our model is equivalent to a combination of kernel k-means and k-means methods under certain condition.
arXiv Detail & Related papers (2020-08-31T08:41:20Z) - Clustering Binary Data by Application of Combinatorial Optimization
Heuristics [52.77024349608834]
We study clustering methods for binary data, first defining aggregation criteria that measure the compactness of clusters.
Five new and original methods are introduced, using neighborhoods and population behavior optimization metaheuristics.
From a set of 16 data tables generated by a quasi-Monte Carlo experiment, a comparison is performed for one of the aggregations using L1 dissimilarity, with hierarchical clustering, and a version of k-means: partitioning around medoids or PAM.
arXiv Detail & Related papers (2020-01-06T23:33:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.