A Rapid Review of Clustering Algorithms
- URL: http://arxiv.org/abs/2401.07389v1
- Date: Sun, 14 Jan 2024 23:19:53 GMT
- Title: A Rapid Review of Clustering Algorithms
- Authors: Hui Yin, Amir Aryani, Stephen Petrie, Aishwarya Nambissan, Aland
Astudillo, Shengyuan Cao
- Abstract summary: Clustering algorithms aim to organize data into groups or clusters based on the inherent patterns and similarities within the data.
They play an important role in today's life, such as in marketing and e-commerce, healthcare, data organization and analysis, and social media.
We analyzed existing clustering algorithms and classify mainstream algorithms across five different dimensions.
- Score: 5.46715422237599
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Clustering algorithms aim to organize data into groups or clusters based on
the inherent patterns and similarities within the data. They play an important
role in today's life, such as in marketing and e-commerce, healthcare, data
organization and analysis, and social media. Numerous clustering algorithms
exist, with ongoing developments introducing new ones. Each algorithm possesses
its own set of strengths and weaknesses, and as of now, there is no universally
applicable algorithm for all tasks. In this work, we analyzed existing
clustering algorithms and classify mainstream algorithms across five different
dimensions: underlying principles and characteristics, data point assignment to
clusters, dataset capacity, predefined cluster numbers and application area.
This classification facilitates researchers in understanding clustering
algorithms from various perspectives and helps them identify algorithms
suitable for solving specific tasks. Finally, we discussed the current trends
and potential future directions in clustering algorithms. We also identified
and discussed open challenges and unresolved issues in the field.
Related papers
- From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models [63.188607839223046]
This survey focuses on the benefits of scaling compute during inference.
We explore three areas under a unified mathematical formalism: token-level generation algorithms, meta-generation algorithms, and efficient generation.
arXiv Detail & Related papers (2024-06-24T17:45:59Z) - A Weighted K-Center Algorithm for Data Subset Selection [70.49696246526199]
Subset selection is a fundamental problem that can play a key role in identifying smaller portions of the training data.
We develop a novel factor 3-approximation algorithm to compute subsets based on the weighted sum of both k-center and uncertainty sampling objective functions.
arXiv Detail & Related papers (2023-12-17T04:41:07Z) - Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering.
In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework.
In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z) - Deep Clustering: A Comprehensive Survey [53.387957674512585]
Clustering analysis plays an indispensable role in machine learning and data mining.
Deep clustering, which can learn clustering-friendly representations using deep neural networks, has been broadly applied in a wide range of clustering tasks.
Existing surveys for deep clustering mainly focus on the single-view fields and the network architectures, ignoring the complex application scenarios of clustering.
arXiv Detail & Related papers (2022-10-09T02:31:32Z) - Detection and Evaluation of Clusters within Sequential Data [58.720142291102135]
Clustering algorithms for Block Markov Chains possess theoretical optimality guarantees.
In particular, our sequential data is derived from human DNA, written text, animal movement data and financial markets.
It is found that the Block Markov Chain model assumption can indeed produce meaningful insights in exploratory data analyses.
arXiv Detail & Related papers (2022-10-04T15:22:39Z) - Fairness Degrading Adversarial Attacks Against Clustering Algorithms [35.40427659749882]
We propose a fairness degrading attack algorithm for k-median clustering.
We find that the addition of the generated adversarial samples can lead to significantly lower fairness values.
arXiv Detail & Related papers (2021-10-22T19:10:27Z) - A review of systematic selection of clustering algorithms and their
evaluation [0.0]
This paper aims to identify a systematic selection logic for clustering algorithms and corresponding validation concepts.
The goal is to enable potential users to choose an algorithm that fits best to their needs and the properties of their underlying data clustering problem.
arXiv Detail & Related papers (2021-06-24T07:01:46Z) - Fair Clustering Using Antidote Data [35.40427659749882]
We propose an alternate approach to fairness in clustering where we augment the original dataset with a small number of data points, called antidote data.
Our algorithms achieve lower fairness costs and competitive clustering performance compared to other state-of-the-art fair clustering algorithms.
arXiv Detail & Related papers (2021-06-01T16:07:52Z) - DAC: Deep Autoencoder-based Clustering, a General Deep Learning
Framework of Representation Learning [0.0]
We propose DAC, Deep Autoencoder-based Clustering, a data-driven framework to learn clustering representations using deep neuron networks.
Experiment results show that our approach could effectively boost performance of the KMeans clustering algorithm on a variety of datasets.
arXiv Detail & Related papers (2021-02-15T11:31:00Z) - A black-box adversarial attack for poisoning clustering [78.19784577498031]
We propose a black-box adversarial attack for crafting adversarial samples to test the robustness of clustering algorithms.
We show that our attacks are transferable even against supervised algorithms such as SVMs, random forests, and neural networks.
arXiv Detail & Related papers (2020-09-09T18:19:31Z) - A semi-supervised sparse K-Means algorithm [3.04585143845864]
An unsupervised sparse clustering method can be employed in order to detect the subgroup of features necessary for clustering.
A semi-supervised method can use the labelled data to create constraints and enhance the clustering solution.
We show that the algorithm maintains the high performance of other semi-supervised algorithms and in addition preserves the ability to identify informative from uninformative features.
arXiv Detail & Related papers (2020-03-16T02:05:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.