Related papers: Incremental hierarchical text clustering methods: a review

Incremental hierarchical text clustering methods: a review

URL: http://arxiv.org/abs/2312.07769v1
Date: Tue, 12 Dec 2023 22:27:29 GMT
Title: Incremental hierarchical text clustering methods: a review
Authors: Fernando Simeone, Maik Olher Chaves, Ahmed Esmin
Abstract summary: This study aims to analyze various hierarchical and incremental clustering techniques. The main contribution of this research is the organization and comparison of the techniques used by studies published between 2010 and 2018 that aimed to texts documents clustering.
Score: 49.32130498861987
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The growth in Internet usage has contributed to a large volume of continuously available data, and has created the need for automatic and efficient organization of the data. In this context, text clustering techniques are significant because they aim to organize documents according to their characteristics. More specifically, hierarchical and incremental clustering techniques can organize dynamic data in a hierarchical form, thus guaranteeing that this organization is updated and its exploration is facilitated. Based on the relevance and contemporary nature of the field, this study aims to analyze various hierarchical and incremental clustering techniques; the main contribution of this research is the organization and comparison of the techniques used by studies published between 2010 and 2018 that aimed to texts documents clustering. We describe the principal concepts related to the challenge and the different characteristics of these published works in order to provide a better understanding of the research in this field.

Related papers

Enhancing Retrieval Augmented Generation with Hierarchical Text Segmentation Chunking [0.9968037829925942]
This paper proposes a novel framework that enhances RAG by integrating hierarchical text segmentation and clustering.<n>During inference, the framework retrieves information by leveraging both segment-level and cluster-level vector representations.<n> Evaluations on the NarrativeQA, QuALITY, and QASPER datasets indicate that the proposed method achieved improved results compared to traditional chunking techniques.
arXiv Detail & Related papers (2025-07-14T05:21:58Z)
Organize the Web: Constructing Domains Enhances Pre-Training Data Curation [129.27104172458363]
We develop a framework for organizing web pages in terms of both their topic and format. We automatically annotate pre-training data by distilling annotations from a large language model into efficient curations. Our work demonstrates that constructing and mixing domains provides a valuable complement to quality-based data curation methods.
arXiv Detail & Related papers (2025-02-14T18:02:37Z)
Data clustering: an essential technique in data science [28.124442353352183]
The paper highlights key principles underpinning clustering, outlines widely used tools and frameworks, and introduces the workflow of clustering in data science. The paper concludes with insights into future research directions, emphasizing clustering's role in driving innovation and enabling data-driven decision-making.
arXiv Detail & Related papers (2024-12-25T03:14:18Z)
HiReview: Hierarchical Taxonomy-Driven Automatic Literature Review Generation [15.188580557890942]
HiReview is a novel framework for hierarchical taxonomy-driven automatic literature review generation. Extensive experiments demonstrate that HiReview significantly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2024-10-02T13:02:03Z)
Text Clustering with LLM Embeddings [0.0]
The effectiveness of text clustering largely depends on the selection of textual embeddings and clustering algorithms. Recent advancements in large language models (LLMs) have the potential to enhance this task. Findings indicate that LLM embeddings are superior at capturing subtleties in structured language.
arXiv Detail & Related papers (2024-03-22T11:08:48Z)
A Comprehensive Survey of Text Classification Techniques and Their Research Applications: Observational and Experimental Insights [2.1436706159840013]
This survey paper introduces a comprehensive taxonomy specifically designed for text classification based on research fields. The taxonomy is structured into hierarchical levels: research field-based category, research field-based sub-category, methodology-based technique, methodology sub-technique, and research field applications.
arXiv Detail & Related papers (2024-01-11T08:17:42Z)
Deep Clustering: A Comprehensive Survey [53.387957674512585]
Clustering analysis plays an indispensable role in machine learning and data mining. Deep clustering, which can learn clustering-friendly representations using deep neural networks, has been broadly applied in a wide range of clustering tasks. Existing surveys for deep clustering mainly focus on the single-view fields and the network architectures, ignoring the complex application scenarios of clustering.
arXiv Detail & Related papers (2022-10-09T02:31:32Z)
Research Trends and Applications of Data Augmentation Algorithms [77.34726150561087]
We identify the main areas of application of data augmentation algorithms, the types of algorithms used, significant research trends, their progression over time and research gaps in data augmentation literature. We expect readers to understand the potential of data augmentation, as well as identify future research directions and open questions within data augmentation research.
arXiv Detail & Related papers (2022-07-18T11:38:32Z)
A Comprehensive Survey on Deep Clustering: Taxonomy, Challenges, and Future Directions [48.97008907275482]
Clustering is a fundamental machine learning task which has been widely studied in the literature. Deep Clustering, i.e., jointly optimizing the representation learning and clustering, has been proposed and hence attracted growing attention in the community. We summarize the essential components of deep clustering and categorize existing methods by the ways they design interactions between deep representation learning and clustering.
arXiv Detail & Related papers (2022-06-15T15:05:13Z)
TaxoCom: Topic Taxonomy Completion with Hierarchical Discovery of Novel Topic Clusters [57.59286394188025]
We propose a novel framework for topic taxonomy completion, named TaxoCom. TaxoCom discovers novel sub-topic clusters of terms and documents. Our comprehensive experiments on two real-world datasets demonstrate that TaxoCom not only generates the high-quality topic taxonomy in terms of term coherency and topic coverage.
arXiv Detail & Related papers (2022-01-18T07:07:38Z)
Path Based Hierarchical Clustering on Knowledge Graphs [1.713291434132985]
We present a novel approach for inducing a hierarchy of subject clusters. Our method first constructs a tag hierarchy before assigning subjects to clusters on this hierarchy. We quantitatively demonstrate our method's ability to induce a coherent cluster hierarchy on three real-world datasets.
arXiv Detail & Related papers (2021-09-27T16:42:43Z)
A Survey on Text Classification: From Shallow to Deep Learning [83.47804123133719]
The last decade has seen a surge of research in this area due to the unprecedented success of deep learning. This paper fills the gap by reviewing the state-of-the-art approaches from 1961 to 2021. We create a taxonomy for text classification according to the text involved and the models used for feature extraction and classification.
arXiv Detail & Related papers (2020-08-02T00:09:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.