Incremental hierarchical text clustering methods: a review
- URL: http://arxiv.org/abs/2312.07769v1
- Date: Tue, 12 Dec 2023 22:27:29 GMT
- Title: Incremental hierarchical text clustering methods: a review
- Authors: Fernando Simeone, Maik Olher Chaves, Ahmed Esmin
- Abstract summary: This study aims to analyze various hierarchical and incremental clustering techniques.
The main contribution of this research is the organization and comparison of the techniques used by studies published between 2010 and 2018 that aimed to texts documents clustering.
- Score: 49.32130498861987
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The growth in Internet usage has contributed to a large volume of
continuously available data, and has created the need for automatic and
efficient organization of the data. In this context, text clustering techniques
are significant because they aim to organize documents according to their
characteristics. More specifically, hierarchical and incremental clustering
techniques can organize dynamic data in a hierarchical form, thus guaranteeing
that this organization is updated and its exploration is facilitated. Based on
the relevance and contemporary nature of the field, this study aims to analyze
various hierarchical and incremental clustering techniques; the main
contribution of this research is the organization and comparison of the
techniques used by studies published between 2010 and 2018 that aimed to texts
documents clustering. We describe the principal concepts related to the
challenge and the different characteristics of these published works in order
to provide a better understanding of the research in this field.
Related papers
- HiReview: Hierarchical Taxonomy-Driven Automatic Literature Review Generation [15.188580557890942]
HiReview is a novel framework for hierarchical taxonomy-driven automatic literature review generation.
Extensive experiments demonstrate that HiReview significantly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2024-10-02T13:02:03Z) - Categorical data clustering: 25 years beyond K-modes [1.545264698293902]
categorical data clustering is a common and important task in computer science.
This review provides a comprehensive synthesis of categorical data clustering in the past twenty-five years.
It elucidates the pivotal role of categorical data clustering in diverse fields such as health sciences, natural sciences, social sciences, education, engineering and economics.
arXiv Detail & Related papers (2024-08-30T12:36:00Z) - Text Clustering with LLM Embeddings [0.0]
The effectiveness of text clustering largely depends on the selection of textual embeddings and clustering algorithms.
Recent advancements in large language models (LLMs) have the potential to enhance this task.
Findings indicate that LLM embeddings are superior at capturing subtleties in structured language.
arXiv Detail & Related papers (2024-03-22T11:08:48Z) - Deep Clustering: A Comprehensive Survey [53.387957674512585]
Clustering analysis plays an indispensable role in machine learning and data mining.
Deep clustering, which can learn clustering-friendly representations using deep neural networks, has been broadly applied in a wide range of clustering tasks.
Existing surveys for deep clustering mainly focus on the single-view fields and the network architectures, ignoring the complex application scenarios of clustering.
arXiv Detail & Related papers (2022-10-09T02:31:32Z) - Research Trends and Applications of Data Augmentation Algorithms [77.34726150561087]
We identify the main areas of application of data augmentation algorithms, the types of algorithms used, significant research trends, their progression over time and research gaps in data augmentation literature.
We expect readers to understand the potential of data augmentation, as well as identify future research directions and open questions within data augmentation research.
arXiv Detail & Related papers (2022-07-18T11:38:32Z) - A Comprehensive Survey on Deep Clustering: Taxonomy, Challenges, and
Future Directions [48.97008907275482]
Clustering is a fundamental machine learning task which has been widely studied in the literature.
Deep Clustering, i.e., jointly optimizing the representation learning and clustering, has been proposed and hence attracted growing attention in the community.
We summarize the essential components of deep clustering and categorize existing methods by the ways they design interactions between deep representation learning and clustering.
arXiv Detail & Related papers (2022-06-15T15:05:13Z) - TaxoCom: Topic Taxonomy Completion with Hierarchical Discovery of Novel
Topic Clusters [57.59286394188025]
We propose a novel framework for topic taxonomy completion, named TaxoCom.
TaxoCom discovers novel sub-topic clusters of terms and documents.
Our comprehensive experiments on two real-world datasets demonstrate that TaxoCom not only generates the high-quality topic taxonomy in terms of term coherency and topic coverage.
arXiv Detail & Related papers (2022-01-18T07:07:38Z) - Path Based Hierarchical Clustering on Knowledge Graphs [1.713291434132985]
We present a novel approach for inducing a hierarchy of subject clusters.
Our method first constructs a tag hierarchy before assigning subjects to clusters on this hierarchy.
We quantitatively demonstrate our method's ability to induce a coherent cluster hierarchy on three real-world datasets.
arXiv Detail & Related papers (2021-09-27T16:42:43Z) - A Survey on Text Classification: From Shallow to Deep Learning [83.47804123133719]
The last decade has seen a surge of research in this area due to the unprecedented success of deep learning.
This paper fills the gap by reviewing the state-of-the-art approaches from 1961 to 2021.
We create a taxonomy for text classification according to the text involved and the models used for feature extraction and classification.
arXiv Detail & Related papers (2020-08-02T00:09:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.