Estimating the Effective Topics of Articles and journals Abstract Using LDA And K-Means Clustering Algorithm
- URL: http://arxiv.org/abs/2508.16046v1
- Date: Fri, 22 Aug 2025 02:51:33 GMT
- Title: Estimating the Effective Topics of Articles and journals Abstract Using LDA And K-Means Clustering Algorithm
- Authors: Shadikur Rahman, Umme Ayman Koana, Aras M. Ismael, Karmand Hussein Abdalla,
- Abstract summary: We have used LDA, K-Means cluster and also lexical database WordNet for keyphrases extraction in our text documents.<n>This study will help the researcher to make a search string based on journals and articles by avoiding misunderstandings.
- Score: 0.2765128393602164
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Analyzing journals and articles abstract text or documents using topic modelling and text clustering has become a modern solution for the increasing number of text documents. Topic modelling and text clustering are both intensely involved tasks that can benefit one another. Text clustering and topic modelling algorithms are used to maintain massive amounts of text documents. In this study, we have used LDA, K-Means cluster and also lexical database WordNet for keyphrases extraction in our text documents. K-Means cluster and LDA algorithms achieve the most reliable performance for keyphrase extraction in our text documents. This study will help the researcher to make a search string based on journals and articles by avoiding misunderstandings.
Related papers
- Context-Aware Hierarchical Taxonomy Generation for Scientific Papers via LLM-Guided Multi-Aspect Clustering [59.54662810933882]
Existing taxonomy construction methods, leveraging unsupervised clustering or direct prompting of large language models, often lack coherence and granularity.<n>We propose a novel context-aware hierarchical taxonomy generation framework that integrates LLM-guided multi-aspect encoding with dynamic clustering.
arXiv Detail & Related papers (2025-09-23T15:12:58Z) - An Enhanced Model-based Approach for Short Text Clustering [58.60681789677676]
Short text clustering has become increasingly important with the popularity of social media like Twitter, Google+, and Facebook.<n>Existing methods can be broadly categorized into two paradigms: topic model-based approaches and deep representation learning-based approaches.<n>We propose a collapsed Gibbs Sampling algorithm for the Dirichlet Multinomial Mixture model (GSDMM), which effectively handles the sparsity and high dimensionality of short texts.<n>Based on several aspects of GSDMM that warrant further refinement, we propose an improved approach, GSDMM+, designed to further optimize its performance.
arXiv Detail & Related papers (2025-07-18T10:07:42Z) - DISRetrieval: Harnessing Discourse Structure for Long Document Retrieval [51.89673002051528]
DISRetrieval is a novel hierarchical retrieval framework that leverages linguistic discourse structure to enhance long document understanding.<n>Our studies confirm that discourse structure significantly enhances retrieval effectiveness across different document lengths and query types.
arXiv Detail & Related papers (2025-05-26T14:45:12Z) - Cequel: Cost-Effective Querying of Large Language Models for Text Clustering [15.179854529085544]
Text clustering aims to automatically partition a collection of documents into coherent groups based on their linguistic features.<n>Recent advances in large language models (LLMs) have significantly improved this field by providing high-quality contextualized embeddings.<n>We propose Cequel, a cost-effective framework that achieves accurate text clustering under a limited budget of LLM queries.
arXiv Detail & Related papers (2025-04-22T06:57:49Z) - Integrating Planning into Single-Turn Long-Form Text Generation [66.08871753377055]
We propose to use planning to generate long form content.
Our main novelty lies in a single auxiliary task that does not require multiple rounds of prompting or planning.
Our experiments demonstrate on two datasets from different domains, that LLMs fine-tuned with the auxiliary task generate higher quality documents.
arXiv Detail & Related papers (2024-10-08T17:02:40Z) - Text Clustering as Classification with LLMs [9.128151647718251]
We propose a novel framework that reframes text clustering as a classification task by harnessing the in-context learning capabilities of Large Language Models.<n>By leveraging the advanced natural language understanding and generalization capabilities of LLMs, the proposed approach enables effective clustering with minimal human intervention.<n> Experimental results on diverse datasets demonstrate that our framework achieves comparable or superior performance to state-of-the-art embedding-based clustering techniques.
arXiv Detail & Related papers (2024-09-30T16:57:34Z) - PromptReps: Prompting Large Language Models to Generate Dense and Sparse Representations for Zero-Shot Document Retrieval [76.50690734636477]
We propose PromptReps, which combines the advantages of both categories: no need for training and the ability to retrieve from the whole corpus.
The retrieval system harnesses both dense text embedding and sparse bag-of-words representations.
arXiv Detail & Related papers (2024-04-29T04:51:30Z) - Text Clustering with Large Language Model Embeddings [0.0]
The effectiveness of text clustering largely depends on the selection of textual embeddings and clustering algorithms.<n>Recent advancements in large language models (LLMs) have the potential to enhance this task.<n>Findings indicate that LLM embeddings are superior at capturing subtleties in structured language.
arXiv Detail & Related papers (2024-03-22T11:08:48Z) - Incremental hierarchical text clustering methods: a review [49.32130498861987]
This study aims to analyze various hierarchical and incremental clustering techniques.
The main contribution of this research is the organization and comparison of the techniques used by studies published between 2010 and 2018 that aimed to texts documents clustering.
arXiv Detail & Related papers (2023-12-12T22:27:29Z) - Author Clustering and Topic Estimation for Short Texts [69.54017251622211]
We propose a novel model that expands on the Latent Dirichlet Allocation by modeling strong dependence among the words in the same document.
We also simultaneously cluster users, removing the need for post-hoc cluster estimation.
Our method performs as well as -- or better -- than traditional approaches to problems arising in short text.
arXiv Detail & Related papers (2021-06-15T20:55:55Z) - BATS: A Spectral Biclustering Approach to Single Document Topic Modeling
and Segmentation [17.003488045214972]
Existing topic modeling and text segmentation methodologies generally require large datasets for training, limiting their capabilities when only small collections of text are available.
In developing a methodology to handle single documents, we face two major challenges.
First is sparse information: with access to only one document, we cannot train traditional topic models or deep learning algorithms.
Second is significant noise: a considerable portion of words in any single document will produce only noise and not help discern topics or segments.
arXiv Detail & Related papers (2020-08-05T16:34:33Z) - GLEAKE: Global and Local Embedding Automatic Keyphrase Extraction [1.0681288493631977]
We introduce Global and Local Embedding Automatic Keyphrase Extractor (GLEAKE) for the task of automatic keyphrase extraction.
GLEAKE uses single and multi-word embedding techniques to explore the syntactic and semantic aspects of the candidate phrases.
It refines the most significant phrases as a final set of keyphrases.
arXiv Detail & Related papers (2020-05-19T20:24:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.