Seeking the Truth Beyond the Data. An Unsupervised Machine Learning
Approach
- URL: http://arxiv.org/abs/2207.06949v4
- Date: Thu, 19 Oct 2023 13:23:19 GMT
- Title: Seeking the Truth Beyond the Data. An Unsupervised Machine Learning
Approach
- Authors: Dimitrios Saligkaras and Vasileios E. Papageorgiou
- Abstract summary: Clustering is an unsupervised machine learning methodology where unlabeled elements/objects are grouped together.
This article provides a deep description of the most widely used clustering methodologies.
It emphasizes the comparison of these algorithms' clustering efficiency based on 3 datasets.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Clustering is an unsupervised machine learning methodology where unlabeled
elements/objects are grouped together aiming to the construction of
well-established clusters that their elements are classified according to their
similarity. The goal of this process is to provide a useful aid to the
researcher that will help her/him to identify patterns among the data. Dealing
with large databases, such patterns may not be easily detectable without the
contribution of a clustering algorithm. This article provides a deep
description of the most widely used clustering methodologies accompanied by
useful presentations concerning suitable parameter selection and
initializations. Simultaneously, this article not only represents a review
highlighting the major elements of examined clustering techniques but
emphasizes the comparison of these algorithms' clustering efficiency based on 3
datasets, revealing their existing weaknesses and capabilities through accuracy
and complexity, during the confrontation of discrete and continuous
observations. The produced results help us extract valuable conclusions about
the appropriateness of the examined clustering techniques in accordance with
the dataset's size.
Related papers
- Order Is All You Need for Categorical Data Clustering [29.264630563297466]
Categorical data composed of nominal valued attributes are ubiquitous in knowledge discovery and data mining tasks.
Due to the lack of well-defined metric space, categorical data distributions are difficult to intuitively understand.
This paper introduces the new finding that the order relation among attribute values is the decisive factor in clustering accuracy.
arXiv Detail & Related papers (2024-11-19T08:23:25Z) - GCC: Generative Calibration Clustering [55.44944397168619]
We propose a novel Generative Clustering (GCC) method to incorporate feature learning and augmentation into clustering procedure.
First, we develop a discrimirative feature alignment mechanism to discover intrinsic relationship across real and generated samples.
Second, we design a self-supervised metric learning to generate more reliable cluster assignment.
arXiv Detail & Related papers (2024-04-14T01:51:11Z) - Robust and Automatic Data Clustering: Dirichlet Process meets
Median-of-Means [18.3248037914529]
We present an efficient and automatic clustering technique by integrating the principles of model-based and centroid-based methodologies.
Statistical guarantees on the upper bound of clustering error suggest the advantages of our proposed method over existing state-of-the-art clustering algorithms.
arXiv Detail & Related papers (2023-11-26T19:01:15Z) - Using Decision Trees for Interpretable Supervised Clustering [0.0]
supervised clustering aims at forming clusters of labelled data with high probability densities.
We are particularly interested in finding clusters of data of a given class and describing the clusters with the set of comprehensive rules.
arXiv Detail & Related papers (2023-07-16T17:12:45Z) - Unified Multi-View Orthonormal Non-Negative Graph Based Clustering
Framework [74.25493157757943]
We formulate a novel clustering model, which exploits the non-negative feature property and incorporates the multi-view information into a unified joint learning framework.
We also explore, for the first time, the multi-model non-negative graph-based approach to clustering data based on deep features.
arXiv Detail & Related papers (2022-11-03T08:18:27Z) - Detection and Evaluation of Clusters within Sequential Data [58.720142291102135]
Clustering algorithms for Block Markov Chains possess theoretical optimality guarantees.
In particular, our sequential data is derived from human DNA, written text, animal movement data and financial markets.
It is found that the Block Markov Chain model assumption can indeed produce meaningful insights in exploratory data analyses.
arXiv Detail & Related papers (2022-10-04T15:22:39Z) - Clustering Optimisation Method for Highly Connected Biological Data [0.0]
We show how a simple metric for connectivity clustering evaluation leads to an optimised segmentation of biological data.
The novelty of the work resides in the creation of a simple optimisation method for clustering crowded data.
arXiv Detail & Related papers (2022-08-08T17:33:32Z) - A review of systematic selection of clustering algorithms and their
evaluation [0.0]
This paper aims to identify a systematic selection logic for clustering algorithms and corresponding validation concepts.
The goal is to enable potential users to choose an algorithm that fits best to their needs and the properties of their underlying data clustering problem.
arXiv Detail & Related papers (2021-06-24T07:01:46Z) - Integrating Auxiliary Information in Self-supervised Learning [94.11964997622435]
We first observe that the auxiliary information may bring us useful information about data structures.
We present to construct data clusters according to the auxiliary information.
We show that Cl-InfoNCE may be a better approach to leverage the data clustering information.
arXiv Detail & Related papers (2021-06-05T11:01:15Z) - You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation.
We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one.
By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.