Related papers: Data clustering: an essential technique in data science

Data clustering: an essential technique in data science

URL: http://arxiv.org/abs/2412.18760v2
Date: Thu, 30 Jan 2025 21:57:33 GMT
Title: Data clustering: an essential technique in data science
Authors: Tai Dinh, Wong Hauchi, Daniil Lisik, Michal Koren, Dat Tran, Philip S. Yu, Joaquín Torres-Sospedra,
Abstract summary: The paper highlights key principles underpinning clustering, outlines widely used tools and frameworks, and introduces the workflow of clustering in data science.<n>The paper concludes with insights into future research directions, emphasizing clustering's role in driving innovation and enabling data-driven decision-making.
Score: 28.124442353352183
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper explores the critical role of data clustering in data science, emphasizing its methodologies, tools, and diverse applications. Traditional techniques, such as partitional and hierarchical clustering, are analyzed alongside advanced approaches such as data stream, density-based, graph-based, and model-based clustering for handling complex structured datasets. The paper highlights key principles underpinning clustering, outlines widely used tools and frameworks, introduces the workflow of clustering in data science, discusses challenges in practical implementation, and examines various applications of clustering. By focusing on these foundations and applications, the discussion underscores clustering's transformative potential. The paper concludes with insights into future research directions, emphasizing clustering's role in driving innovation and enabling data-driven decision-making.

Related papers

Mixed Data Clustering Survey and Challenges [0.0]
This paper introduces a clustering method grounded in pretopological spaces.<n> benchmarking against classical numerical clustering algorithms yields insights into the performance and effectiveness of the proposed method.
arXiv Detail & Related papers (2025-11-27T08:20:05Z)
Context-Aware Hierarchical Taxonomy Generation for Scientific Papers via LLM-Guided Multi-Aspect Clustering [59.54662810933882]
Existing taxonomy construction methods, leveraging unsupervised clustering or direct prompting of large language models, often lack coherence and granularity.<n>We propose a novel context-aware hierarchical taxonomy generation framework that integrates LLM-guided multi-aspect encoding with dynamic clustering.
arXiv Detail & Related papers (2025-09-23T15:12:58Z)
Categorical data clustering: 25 years beyond K-modes [1.545264698293902]
categorical data clustering is a common and important task in computer science. This review provides a comprehensive synthesis of categorical data clustering in the past twenty-five years. It elucidates the pivotal role of categorical data clustering in diverse fields such as health sciences, natural sciences, social sciences, education, engineering and economics.
arXiv Detail & Related papers (2024-08-30T12:36:00Z)
Incremental hierarchical text clustering methods: a review [49.32130498861987]
This study aims to analyze various hierarchical and incremental clustering techniques. The main contribution of this research is the organization and comparison of the techniques used by studies published between 2010 and 2018 that aimed to texts documents clustering.
arXiv Detail & Related papers (2023-12-12T22:27:29Z)
Unified Multi-View Orthonormal Non-Negative Graph Based Clustering Framework [74.25493157757943]
We formulate a novel clustering model, which exploits the non-negative feature property and incorporates the multi-view information into a unified joint learning framework. We also explore, for the first time, the multi-model non-negative graph-based approach to clustering data based on deep features.
arXiv Detail & Related papers (2022-11-03T08:18:27Z)
Deep Clustering: A Comprehensive Survey [53.387957674512585]
Clustering analysis plays an indispensable role in machine learning and data mining. Deep clustering, which can learn clustering-friendly representations using deep neural networks, has been broadly applied in a wide range of clustering tasks. Existing surveys for deep clustering mainly focus on the single-view fields and the network architectures, ignoring the complex application scenarios of clustering.
arXiv Detail & Related papers (2022-10-09T02:31:32Z)
Detection and Evaluation of Clusters within Sequential Data [58.720142291102135]
Clustering algorithms for Block Markov Chains possess theoretical optimality guarantees. In particular, our sequential data is derived from human DNA, written text, animal movement data and financial markets. It is found that the Block Markov Chain model assumption can indeed produce meaningful insights in exploratory data analyses.
arXiv Detail & Related papers (2022-10-04T15:22:39Z)
Review of Clustering Methods for Functional Data [1.827510863075184]
The review aims to bridge the gap between the functional data analysis community and the clustering community. We propose a systematic taxonomy that explores the connections and differences among the existing functional data clustering methods.
arXiv Detail & Related papers (2022-10-03T12:15:23Z)
A Comprehensive Survey on Deep Clustering: Taxonomy, Challenges, and Future Directions [48.97008907275482]
Clustering is a fundamental machine learning task which has been widely studied in the literature. Deep Clustering, i.e., jointly optimizing the representation learning and clustering, has been proposed and hence attracted growing attention in the community. We summarize the essential components of deep clustering and categorize existing methods by the ways they design interactions between deep representation learning and clustering.
arXiv Detail & Related papers (2022-06-15T15:05:13Z)
A Proposition-Level Clustering Approach for Multi-Document Summarization [82.4616498914049]
We revisit the clustering approach, grouping together propositions for more precise information alignment. Our method detects salient propositions, clusters them into paraphrastic clusters, and generates a representative sentence for each cluster by fusing its propositions. Our summarization method improves over the previous state-of-the-art MDS method in the DUC 2004 and TAC 2011 datasets.
arXiv Detail & Related papers (2021-12-16T10:34:22Z)
CateCom: a practical data-centric approach to categorization of computational models [77.34726150561087]
We present an effort aimed at organizing the landscape of physics-based and data-driven computational models. We apply object-oriented design concepts and outline the foundations of an open-source collaborative framework.
arXiv Detail & Related papers (2021-09-28T02:59:40Z)
A Comprehensive Survey on Community Detection with Deep Learning [93.40332347374712]
A community reveals the features and connections of its members that are different from those in other communities in a network. This survey devises and proposes a new taxonomy covering different categories of the state-of-the-art methods. The main category, i.e., deep neural networks, is further divided into convolutional networks, graph attention networks, generative adversarial networks and autoencoders.
arXiv Detail & Related papers (2021-05-26T14:37:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.