Hierarchical Clustering using Reversible Binary Cellular Automata for High-Dimensional Data
- URL: http://arxiv.org/abs/2408.02250v1
- Date: Mon, 5 Aug 2024 05:48:45 GMT
- Title: Hierarchical Clustering using Reversible Binary Cellular Automata for High-Dimensional Data
- Authors: Baby C. J., Kamalika Bhattacharjee,
- Abstract summary: In cellular automaton (CA) based clustering, if two objects belong to the same cycle, they are closely related and considered as part of the same cluster.
This paper identifies the relationship between objects in two different cycles based on the median of all elements in each cycle so that they can be grouped in the next stage.
When verified over standard benchmark datasets with various performance metrics, our algorithm is at par with the existing algorithms with quadratic time complexity.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work proposes a hierarchical clustering algorithm for high-dimensional datasets using the cyclic space of reversible finite cellular automata. In cellular automaton (CA) based clustering, if two objects belong to the same cycle, they are closely related and considered as part of the same cluster. However, if a high-dimensional dataset is clustered using the cycles of one CA, closely related objects may belong to different cycles. This paper identifies the relationship between objects in two different cycles based on the median of all elements in each cycle so that they can be grouped in the next stage. Further, to minimize the number of intermediate clusters which in turn reduces the computational cost, a rule selection strategy is taken to find the best rules based on information propagation and cycle structure. After encoding the dataset using frequency-based encoding such that the consecutive data elements maintain a minimum hamming distance in encoded form, our proposed clustering algorithm iterates over three stages to finally cluster the data elements into the desired number of clusters given by user. This algorithm can be applied to various fields, including healthcare, sports, chemical research, agriculture, etc. When verified over standard benchmark datasets with various performance metrics, our algorithm is at par with the existing algorithms with quadratic time complexity.
Related papers
- Gödel Number based Clustering Algorithm with Decimal First Degree Cellular Automata [0.0]
In this paper, a decimal first degree cellular automata (FDCA) based clustering algorithm is proposed.
Data objects are encoded into decimal strings using G"odel number based encoding.
In comparison with the existing state-of-the-art clustering algorithms, our proposed algorithm gives better performance.
arXiv Detail & Related papers (2024-05-08T08:30:34Z) - Single-cell Multi-view Clustering via Community Detection with Unknown
Number of Clusters [64.31109141089598]
We introduce scUNC, an innovative multi-view clustering approach tailored for single-cell data.
scUNC seamlessly integrates information from different views without the need for a predefined number of clusters.
We conducted a comprehensive evaluation of scUNC using three distinct single-cell datasets.
arXiv Detail & Related papers (2023-11-28T08:34:58Z) - Determining the Optimal Number of Clusters for Time Series Datasets with
Symbolic Pattern Forest [0.0]
The problem of calculating the optimal number of clusters (say k) is one of the significant challenges for such methods.
In this work, we extended the Symbolic Pattern Forest algorithm to determine the optimal number of clusters for the time series datasets.
We tested our approach on the UCR archive datasets, and our experimental results so far showed significant improvement over the baseline.
arXiv Detail & Related papers (2023-10-01T23:33:37Z) - Data Aggregation for Hierarchical Clustering [0.3626013617212666]
BETULA is a numerically stable version of the well known BIRCH data aggregation algorithm.
It can be used to make HAC viable on systems with constrained resources with only small losses on clustering quality.
arXiv Detail & Related papers (2023-09-05T19:39:43Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - ck-means, a novel unsupervised learning method that combines fuzzy and
crispy clustering methods to extract intersecting data [1.827510863075184]
This paper proposes a method to cluster data that share the same intersections between two features or more.
The main idea of this novel method is to generate fuzzy clusters of data using a Fuzzy C-Means (FCM) algorithm.
The algorithm is also able to find the optimal number of clusters for the FCM and the k-means algorithm, according to the consistency of the clusters given by the Silhouette Index (SI)
arXiv Detail & Related papers (2022-06-17T19:29:50Z) - Exact and Approximate Hierarchical Clustering Using A* [51.187990314731344]
We introduce a new approach based on A* search for clustering.
We overcome the prohibitively large search space by combining A* with a novel emphtrellis data structure.
We empirically demonstrate that our method achieves substantially higher quality results than baselines for a particle physics use case and other clustering benchmarks.
arXiv Detail & Related papers (2021-04-14T18:15:27Z) - Determinantal consensus clustering [77.34726150561087]
We propose the use of determinantal point processes or DPP for the random restart of clustering algorithms.
DPPs favor diversity of the center points within subsets.
We show through simulations that, contrary to DPP, this technique fails both to ensure diversity, and to obtain a good coverage of all data facets.
arXiv Detail & Related papers (2021-02-07T23:48:24Z) - Similarity-based Distance for Categorical Clustering using Space
Structure [5.543220407902113]
We have proposed a novel distance metric, similarity-based distance (SBD) to find the distance between objects of categorical data.
Our proposed distance (SBD) significantly outperforms the existing algorithms like k-modes or other SBC type algorithms when used on categorical datasets.
arXiv Detail & Related papers (2020-11-19T15:18:26Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Optimal Clustering from Noisy Binary Feedback [75.17453757892152]
We study the problem of clustering a set of items from binary user feedback.
We devise an algorithm with a minimal cluster recovery error rate.
For adaptive selection, we develop an algorithm inspired by the derivation of the information-theoretical error lower bounds.
arXiv Detail & Related papers (2019-10-14T09:18:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.