Related papers: Seeking the Truth Beyond the Data. An Unsupervised Machine Learning Approach

Seeking the Truth Beyond the Data. An Unsupervised Machine Learning Approach

URL: http://arxiv.org/abs/2207.06949v4
Date: Thu, 19 Oct 2023 13:23:19 GMT
Title: Seeking the Truth Beyond the Data. An Unsupervised Machine Learning Approach
Authors: Dimitrios Saligkaras and Vasileios E. Papageorgiou
Abstract summary: Clustering is an unsupervised machine learning methodology where unlabeled elements/objects are grouped together. This article provides a deep description of the most widely used clustering methodologies. It emphasizes the comparison of these algorithms' clustering efficiency based on 3 datasets.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Clustering is an unsupervised machine learning methodology where unlabeled elements/objects are grouped together aiming to the construction of well-established clusters that their elements are classified according to their similarity. The goal of this process is to provide a useful aid to the researcher that will help her/him to identify patterns among the data. Dealing with large databases, such patterns may not be easily detectable without the contribution of a clustering algorithm. This article provides a deep description of the most widely used clustering methodologies accompanied by useful presentations concerning suitable parameter selection and initializations. Simultaneously, this article not only represents a review highlighting the major elements of examined clustering techniques but emphasizes the comparison of these algorithms' clustering efficiency based on 3 datasets, revealing their existing weaknesses and capabilities through accuracy and complexity, during the confrontation of discrete and continuous observations. The produced results help us extract valuable conclusions about the appropriateness of the examined clustering techniques in accordance with the dataset's size.

Related papers

evclust: Python library for evidential clustering [0.6215404942415159]
Evidential clustering uses the Dempster-Shafer theory of belief functions to represent uncertainty. The Python framework evclust offers a suite of efficient evidence clustering algorithms as well as tools for visualizing, evaluating and analyzing credal partitions.
arXiv Detail & Related papers (2025-02-10T15:53:26Z)
Order Is All You Need for Categorical Data Clustering [29.264630563297466]
Categorical data composed of nominal valued attributes are ubiquitous in knowledge discovery and data mining tasks. Due to the lack of well-defined metric space, categorical data distributions are difficult to intuitively understand. This paper introduces the new finding that the order relation among attribute values is the decisive factor in clustering accuracy.
arXiv Detail & Related papers (2024-11-19T08:23:25Z)
GCC: Generative Calibration Clustering [55.44944397168619]
We propose a novel Generative Clustering (GCC) method to incorporate feature learning and augmentation into clustering procedure. First, we develop a discrimirative feature alignment mechanism to discover intrinsic relationship across real and generated samples. Second, we design a self-supervised metric learning to generate more reliable cluster assignment.
arXiv Detail & Related papers (2024-04-14T01:51:11Z)
Robust and Automatic Data Clustering: Dirichlet Process meets Median-of-Means [18.3248037914529]
We present an efficient and automatic clustering technique by integrating the principles of model-based and centroid-based methodologies. Statistical guarantees on the upper bound of clustering error suggest the advantages of our proposed method over existing state-of-the-art clustering algorithms.
arXiv Detail & Related papers (2023-11-26T19:01:15Z)
Using Decision Trees for Interpretable Supervised Clustering [0.0]
supervised clustering aims at forming clusters of labelled data with high probability densities. We are particularly interested in finding clusters of data of a given class and describing the clusters with the set of comprehensive rules.
arXiv Detail & Related papers (2023-07-16T17:12:45Z)
Unified Multi-View Orthonormal Non-Negative Graph Based Clustering Framework [74.25493157757943]
We formulate a novel clustering model, which exploits the non-negative feature property and incorporates the multi-view information into a unified joint learning framework. We also explore, for the first time, the multi-model non-negative graph-based approach to clustering data based on deep features.
arXiv Detail & Related papers (2022-11-03T08:18:27Z)
Detection and Evaluation of Clusters within Sequential Data [58.720142291102135]
Clustering algorithms for Block Markov Chains possess theoretical optimality guarantees. In particular, our sequential data is derived from human DNA, written text, animal movement data and financial markets. It is found that the Block Markov Chain model assumption can indeed produce meaningful insights in exploratory data analyses.
arXiv Detail & Related papers (2022-10-04T15:22:39Z)
Clustering Optimisation Method for Highly Connected Biological Data [0.0]
We show how a simple metric for connectivity clustering evaluation leads to an optimised segmentation of biological data. The novelty of the work resides in the creation of a simple optimisation method for clustering crowded data.
arXiv Detail & Related papers (2022-08-08T17:33:32Z)
A review of systematic selection of clustering algorithms and their evaluation [0.0]
This paper aims to identify a systematic selection logic for clustering algorithms and corresponding validation concepts. The goal is to enable potential users to choose an algorithm that fits best to their needs and the properties of their underlying data clustering problem.
arXiv Detail & Related papers (2021-06-24T07:01:46Z)
Integrating Auxiliary Information in Self-supervised Learning [94.11964997622435]
We first observe that the auxiliary information may bring us useful information about data structures. We present to construct data clusters according to the auxiliary information. We show that Cl-InfoNCE may be a better approach to leverage the data clustering information.
arXiv Detail & Related papers (2021-06-05T11:01:15Z)
You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation. We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one. By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z)
Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed. We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.