A review of systematic selection of clustering algorithms and their
evaluation
- URL: http://arxiv.org/abs/2106.12792v1
- Date: Thu, 24 Jun 2021 07:01:46 GMT
- Title: A review of systematic selection of clustering algorithms and their
evaluation
- Authors: Marc Wegmann, Domenique Zipperling, Jonas Hillenbrand and J\"urgen
Fleischer
- Abstract summary: This paper aims to identify a systematic selection logic for clustering algorithms and corresponding validation concepts.
The goal is to enable potential users to choose an algorithm that fits best to their needs and the properties of their underlying data clustering problem.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Data analysis plays an indispensable role for value creation in industry.
Cluster analysis in this context is able to explore given datasets with little
or no prior knowledge and to identify unknown patterns. As (big) data
complexity increases in the dimensions volume, variety, and velocity, this
becomes even more important. Many tools for cluster analysis have been
developed from early on and the variety of different clustering algorithms is
huge. As the selection of the right clustering procedure is crucial to the
results of the data analysis, users are in need for support on their journey of
extracting knowledge from raw data. Thus, the objective of this paper lies in
the identification of a systematic selection logic for clustering algorithms
and corresponding validation concepts. The goal is to enable potential users to
choose an algorithm that fits best to their needs and the properties of their
underlying data clustering problem. Moreover, users are supported in selecting
the right validation concepts to make sense of the clustering results. Based on
a comprehensive literature review, this paper provides assessment criteria for
clustering method evaluation and validation concept selection. The criteria are
applied to several common algorithms and the selection process of an algorithm
is supported by the introduction of pseudocode-based routines that consider the
underlying data structure.
Related papers
- From A-to-Z Review of Clustering Validation Indices [4.08908337437878]
We review and evaluate the performance of internal and external clustering validation indices on the most common clustering algorithms.
We suggest a classification framework for examining the functionality of both internal and external clustering validation measures.
arXiv Detail & Related papers (2024-07-18T13:52:02Z) - A Weighted K-Center Algorithm for Data Subset Selection [70.49696246526199]
Subset selection is a fundamental problem that can play a key role in identifying smaller portions of the training data.
We develop a novel factor 3-approximation algorithm to compute subsets based on the weighted sum of both k-center and uncertainty sampling objective functions.
arXiv Detail & Related papers (2023-12-17T04:41:07Z) - A Machine Learning-Based Framework for Clustering Residential
Electricity Load Profiles to Enhance Demand Response Programs [0.0]
We present a novel machine learning based framework in order to achieve optimal load profiling through a real case study.
In this paper, we present a novel machine learning based framework in order to achieve optimal load profiling through a real case study.
arXiv Detail & Related papers (2023-10-31T11:23:26Z) - Deep Clustering: A Comprehensive Survey [53.387957674512585]
Clustering analysis plays an indispensable role in machine learning and data mining.
Deep clustering, which can learn clustering-friendly representations using deep neural networks, has been broadly applied in a wide range of clustering tasks.
Existing surveys for deep clustering mainly focus on the single-view fields and the network architectures, ignoring the complex application scenarios of clustering.
arXiv Detail & Related papers (2022-10-09T02:31:32Z) - Detection and Evaluation of Clusters within Sequential Data [58.720142291102135]
Clustering algorithms for Block Markov Chains possess theoretical optimality guarantees.
In particular, our sequential data is derived from human DNA, written text, animal movement data and financial markets.
It is found that the Block Markov Chain model assumption can indeed produce meaningful insights in exploratory data analyses.
arXiv Detail & Related papers (2022-10-04T15:22:39Z) - Seeking the Truth Beyond the Data. An Unsupervised Machine Learning
Approach [0.0]
Clustering is an unsupervised machine learning methodology where unlabeled elements/objects are grouped together.
This article provides a deep description of the most widely used clustering methodologies.
It emphasizes the comparison of these algorithms' clustering efficiency based on 3 datasets.
arXiv Detail & Related papers (2022-07-14T14:22:36Z) - Ensemble Method for Cluster Number Determination and Algorithm Selection
in Unsupervised Learning [0.0]
Unsupervised learning suffers from the need for expertise in the field to be of use.
We propose an ensemble clustering framework which can be leveraged with minimal input.
arXiv Detail & Related papers (2021-12-23T04:59:10Z) - DAC: Deep Autoencoder-based Clustering, a General Deep Learning
Framework of Representation Learning [0.0]
We propose DAC, Deep Autoencoder-based Clustering, a data-driven framework to learn clustering representations using deep neuron networks.
Experiment results show that our approach could effectively boost performance of the KMeans clustering algorithm on a variety of datasets.
arXiv Detail & Related papers (2021-02-15T11:31:00Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Topology-based Clusterwise Regression for User Segmentation and Demand
Forecasting [63.78344280962136]
Using a public and a novel proprietary data set of commercial data, this research shows that the proposed system enables analysts to both cluster their user base and plan demand at a granular level.
This work seeks to introduce TDA-based clustering of time series and clusterwise regression with matrix factorization methods as viable tools for the practitioner.
arXiv Detail & Related papers (2020-09-08T12:10:10Z) - Optimal Clustering from Noisy Binary Feedback [75.17453757892152]
We study the problem of clustering a set of items from binary user feedback.
We devise an algorithm with a minimal cluster recovery error rate.
For adaptive selection, we develop an algorithm inspired by the derivation of the information-theoretical error lower bounds.
arXiv Detail & Related papers (2019-10-14T09:18:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.