Enhancing Cluster Quality of Numerical Datasets with Domain Ontology
- URL: http://arxiv.org/abs/2304.00653v1
- Date: Sun, 2 Apr 2023 23:40:17 GMT
- Title: Enhancing Cluster Quality of Numerical Datasets with Domain Ontology
- Authors: Sudath Rohitha Heiyanthuduwage, Md Anisur Rahman and Md Zahidul Islam
- Abstract summary: Ontology-based clustering can produce either high quality or low-quality clusters from a dataset.
We present a clustering approach that is based on domain ontology to reduce the dimensionality of attributes in a numerical dataset.
The experimental results of our approach indicate that cluster quality gradually improves from lower to the higher levels of a domain ontology.
- Score: 2.790947019327459
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Ontology-based clustering has gained attention in recent years due to the
potential benefits of ontology. Current ontology-based clustering approaches
have mainly been applied to reduce the dimensionality of attributes in text
document clustering. Reduction in dimensionality of attributes using ontology
helps to produce high quality clusters for a dataset. However, ontology-based
approaches in clustering numerical datasets have not been gained enough
attention. Moreover, some literature mentions that ontology-based clustering
can produce either high quality or low-quality clusters from a dataset.
Therefore, in this paper we present a clustering approach that is based on
domain ontology to reduce the dimensionality of attributes in a numerical
dataset using domain ontology and to produce high quality clusters. For every
dataset, we produce three datasets using domain ontology. We then cluster these
datasets using a genetic algorithm-based clustering technique called
GenClust++. The clusters of each dataset are evaluated in terms of Sum of
Squared-Error (SSE). We use six numerical datasets to evaluate the performance
of our ontology-based approach. The experimental results of our approach
indicate that cluster quality gradually improves from lower to the higher
levels of a domain ontology.
Related papers
- Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering.
In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework.
In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z) - Using Decision Trees for Interpretable Supervised Clustering [0.0]
supervised clustering aims at forming clusters of labelled data with high probability densities.
We are particularly interested in finding clusters of data of a given class and describing the clusters with the set of comprehensive rules.
arXiv Detail & Related papers (2023-07-16T17:12:45Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Deep Clustering: A Comprehensive Survey [53.387957674512585]
Clustering analysis plays an indispensable role in machine learning and data mining.
Deep clustering, which can learn clustering-friendly representations using deep neural networks, has been broadly applied in a wide range of clustering tasks.
Existing surveys for deep clustering mainly focus on the single-view fields and the network architectures, ignoring the complex application scenarios of clustering.
arXiv Detail & Related papers (2022-10-09T02:31:32Z) - Clustering Optimisation Method for Highly Connected Biological Data [0.0]
We show how a simple metric for connectivity clustering evaluation leads to an optimised segmentation of biological data.
The novelty of the work resides in the creation of a simple optimisation method for clustering crowded data.
arXiv Detail & Related papers (2022-08-08T17:33:32Z) - Enhancing cluster analysis via topological manifold learning [0.3823356975862006]
We show that inferring the topological structure of a dataset before clustering can considerably enhance cluster detection.
We combine manifold learning method UMAP for inferring the topological structure with density-based clustering method DBSCAN.
arXiv Detail & Related papers (2022-07-01T15:53:39Z) - DeepCluE: Enhanced Image Clustering via Multi-layer Ensembles in Deep
Neural Networks [53.88811980967342]
This paper presents a Deep Clustering via Ensembles (DeepCluE) approach.
It bridges the gap between deep clustering and ensemble clustering by harnessing the power of multiple layers in deep neural networks.
Experimental results on six image datasets confirm the advantages of DeepCluE over the state-of-the-art deep clustering approaches.
arXiv Detail & Related papers (2022-06-01T09:51:38Z) - Self-supervised Contrastive Attributed Graph Clustering [110.52694943592974]
We propose a novel attributed graph clustering network, namely Self-supervised Contrastive Attributed Graph Clustering (SCAGC)
In SCAGC, by leveraging inaccurate clustering labels, a self-supervised contrastive loss, are designed for node representation learning.
For the OOS nodes, SCAGC can directly calculate their clustering labels.
arXiv Detail & Related papers (2021-10-15T03:25:28Z) - A Multi-disciplinary Ensemble Algorithm for Clustering Heterogeneous
Datasets [0.76146285961466]
We propose a new evolutionary clustering algorithm (ECAStar) based on social class ranking and meta-heuristic algorithms.
ECAStar is integrated with recombinational evolutionary operators, Levy flight optimisation, and some statistical techniques.
Experiments are conducted to evaluate the ECAStar against five conventional approaches.
arXiv Detail & Related papers (2021-01-01T07:20:50Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.