Enhancing Cluster Quality of Numerical Datasets with Domain Ontology
- URL: http://arxiv.org/abs/2304.00653v1
- Date: Sun, 2 Apr 2023 23:40:17 GMT
- Title: Enhancing Cluster Quality of Numerical Datasets with Domain Ontology
- Authors: Sudath Rohitha Heiyanthuduwage, Md Anisur Rahman and Md Zahidul Islam
- Abstract summary: Ontology-based clustering can produce either high quality or low-quality clusters from a dataset.
We present a clustering approach that is based on domain ontology to reduce the dimensionality of attributes in a numerical dataset.
The experimental results of our approach indicate that cluster quality gradually improves from lower to the higher levels of a domain ontology.
- Score: 2.790947019327459
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Ontology-based clustering has gained attention in recent years due to the
potential benefits of ontology. Current ontology-based clustering approaches
have mainly been applied to reduce the dimensionality of attributes in text
document clustering. Reduction in dimensionality of attributes using ontology
helps to produce high quality clusters for a dataset. However, ontology-based
approaches in clustering numerical datasets have not been gained enough
attention. Moreover, some literature mentions that ontology-based clustering
can produce either high quality or low-quality clusters from a dataset.
Therefore, in this paper we present a clustering approach that is based on
domain ontology to reduce the dimensionality of attributes in a numerical
dataset using domain ontology and to produce high quality clusters. For every
dataset, we produce three datasets using domain ontology. We then cluster these
datasets using a genetic algorithm-based clustering technique called
GenClust++. The clusters of each dataset are evaluated in terms of Sum of
Squared-Error (SSE). We use six numerical datasets to evaluate the performance
of our ontology-based approach. The experimental results of our approach
indicate that cluster quality gradually improves from lower to the higher
levels of a domain ontology.
Related papers
- Hierarchical clustering with maximum density paths and mixture models [39.42511559155036]
Hierarchical clustering is an effective and interpretable technique for analyzing structure in data.
It is particularly helpful in settings where the exact number of clusters is unknown, and provides a robust framework for exploring complex datasets.
Our method addresses this limitation by leveraging a two-stage approach, first employing a Gaussian or Student's t mixture model to overcluster the data, and then hierarchically merging clusters based on the induced density landscape.
This approach yields state-of-the-art clustering performance while also providing a meaningful hierarchy, making it a valuable tool for exploratory data analysis.
arXiv Detail & Related papers (2025-03-19T15:37:51Z) - Hyperoctant Search Clustering: A Method for Clustering Data in High-Dimensional Hyperspheres [0.0]
We propose a new clustering method based on a topological approach applied to regions of space defined by signs of coordinates (hyperoctants)
According to a density criterion, the method builds clusters of data points based on the partitioning of a graph.
We choose the application of topic detection, which is an important task in text mining.
arXiv Detail & Related papers (2025-03-10T23:41:44Z) - ClusterGraph: a new tool for visualization and compression of multidimensional data [0.0]
This paper provides an additional layer on the output of any clustering algorithm.
It provides information about the global layout of clusters, obtained from the considered clustering algorithm.
arXiv Detail & Related papers (2024-11-08T09:40:54Z) - Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering.
In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework.
In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z) - Using Decision Trees for Interpretable Supervised Clustering [0.0]
supervised clustering aims at forming clusters of labelled data with high probability densities.
We are particularly interested in finding clusters of data of a given class and describing the clusters with the set of comprehensive rules.
arXiv Detail & Related papers (2023-07-16T17:12:45Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Deep Clustering: A Comprehensive Survey [53.387957674512585]
Clustering analysis plays an indispensable role in machine learning and data mining.
Deep clustering, which can learn clustering-friendly representations using deep neural networks, has been broadly applied in a wide range of clustering tasks.
Existing surveys for deep clustering mainly focus on the single-view fields and the network architectures, ignoring the complex application scenarios of clustering.
arXiv Detail & Related papers (2022-10-09T02:31:32Z) - Clustering Optimisation Method for Highly Connected Biological Data [0.0]
We show how a simple metric for connectivity clustering evaluation leads to an optimised segmentation of biological data.
The novelty of the work resides in the creation of a simple optimisation method for clustering crowded data.
arXiv Detail & Related papers (2022-08-08T17:33:32Z) - Enhancing cluster analysis via topological manifold learning [0.3823356975862006]
We show that inferring the topological structure of a dataset before clustering can considerably enhance cluster detection.
We combine manifold learning method UMAP for inferring the topological structure with density-based clustering method DBSCAN.
arXiv Detail & Related papers (2022-07-01T15:53:39Z) - DeepCluE: Enhanced Image Clustering via Multi-layer Ensembles in Deep
Neural Networks [53.88811980967342]
This paper presents a Deep Clustering via Ensembles (DeepCluE) approach.
It bridges the gap between deep clustering and ensemble clustering by harnessing the power of multiple layers in deep neural networks.
Experimental results on six image datasets confirm the advantages of DeepCluE over the state-of-the-art deep clustering approaches.
arXiv Detail & Related papers (2022-06-01T09:51:38Z) - Self-supervised Contrastive Attributed Graph Clustering [110.52694943592974]
We propose a novel attributed graph clustering network, namely Self-supervised Contrastive Attributed Graph Clustering (SCAGC)
In SCAGC, by leveraging inaccurate clustering labels, a self-supervised contrastive loss, are designed for node representation learning.
For the OOS nodes, SCAGC can directly calculate their clustering labels.
arXiv Detail & Related papers (2021-10-15T03:25:28Z) - Attention-driven Graph Clustering Network [49.040136530379094]
We propose a novel deep clustering method named Attention-driven Graph Clustering Network (AGCN)
AGCN exploits a heterogeneous-wise fusion module to dynamically fuse the node attribute feature and the topological graph feature.
AGCN can jointly perform feature learning and cluster assignment in an unsupervised fashion.
arXiv Detail & Related papers (2021-08-12T02:30:38Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.