Related papers: Qualitative Insights Tool (QualIT): LLM Enhanced Topic Modeling

Qualitative Insights Tool (QualIT): LLM Enhanced Topic Modeling

URL: http://arxiv.org/abs/2409.15626v1
Date: Tue, 24 Sep 2024 00:09:41 GMT
Title: Qualitative Insights Tool (QualIT): LLM Enhanced Topic Modeling
Authors: Satya Kapoor, Alex Gil, Sreyoshi Bhaduri, Anshul Mittal, Rutu Mulkar,
Abstract summary: We present a novel approach that integrates large language models with existing clustering-based topic modeling approaches. We evaluate our approach on a large corpus of news articles and demonstrate substantial improvements in topic coherence and topic diversity.
Score: 1.0949553365997655
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Topic modeling is a widely used technique for uncovering thematic structures from large text corpora. However, most topic modeling approaches e.g. Latent Dirichlet Allocation (LDA) struggle to capture nuanced semantics and contextual understanding required to accurately model complex narratives. Recent advancements in this area include methods like BERTopic, which have demonstrated significantly improved topic coherence and thus established a new standard for benchmarking. In this paper, we present a novel approach, the Qualitative Insights Tool (QualIT) that integrates large language models (LLMs) with existing clustering-based topic modeling approaches. Our method leverages the deep contextual understanding and powerful language generation capabilities of LLMs to enrich the topic modeling process using clustering. We evaluate our approach on a large corpus of news articles and demonstrate substantial improvements in topic coherence and topic diversity compared to baseline topic modeling techniques. On the 20 ground-truth topics, our method shows 70% topic coherence (vs 65% & 57% benchmarks) and 95.5% topic diversity (vs 85% & 72% benchmarks). Our findings suggest that the integration of LLMs can unlock new opportunities for topic modeling of dynamic and complex text data, as is common in talent management research contexts.

Related papers

Retrieval Augmented Generation for Topic Modeling in Organizational Research: An Introduction with Empirical Demonstration [0.0]
This paper introduces Agentic Retrieval-Augmented Generation (Agentic RAG) as a method for topic modeling with LLMs. It integrates three key components: (1) retrieval, enabling automatized access to external data beyond an LLM's pre-trained knowledge; (2) generation, leveraging LLM capabilities for text synthesis; and (3) agent-driven learning, iteratively refining retrieval and query formulation processes. Our findings demonstrate that the approach is more efficient, interpretable and at the same time achieves higher reliability and validity in comparison to the standard machine learning approach.
arXiv Detail & Related papers (2025-02-28T11:25:11Z)
Bridging the Evaluation Gap: Leveraging Large Language Models for Topic Model Evaluation [0.0]
This study presents a framework for automated evaluation of dynamically evolving topic in scientific literature using Large Language Models (LLMs) The proposed approach harnesses LLMs to measure key quality dimensions, such as coherence, repetitiveness, diversity, and topic-document alignment, without heavy reliance on expert annotators or narrow statistical metrics.
arXiv Detail & Related papers (2025-02-11T08:23:56Z)
LITA: An Efficient LLM-assisted Iterative Topic Augmentation Framework [0.0]
Large language models (LLMs) offer potential for dynamic topic refinement and discovery, yet their application often incurs high API costs. To address these challenges, we propose the LLM-assisted Iterative Topic Augmentation framework (LITA) LITA integrates user-provided seeds with embedding-based clustering and iterative refinement.
arXiv Detail & Related papers (2024-12-17T01:43:44Z)
A Survey of Small Language Models [104.80308007044634]
Small Language Models (SLMs) have become increasingly important due to their efficiency and performance to perform various language tasks with minimal computational resources. We present a comprehensive survey on SLMs, focusing on their architectures, training techniques, and model compression techniques.
arXiv Detail & Related papers (2024-10-25T23:52:28Z)
Enhancing Short-Text Topic Modeling with LLM-Driven Context Expansion and Prefix-Tuned VAEs [25.915607750636333]
We propose a novel approach that leverages large language models (LLMs) to extend short texts into more detailed sequences before applying topic modeling. Our method significantly improves short-text topic modeling performance, as demonstrated by extensive experiments on real-world datasets with extreme data sparsity.
arXiv Detail & Related papers (2024-10-04T01:28:56Z)
Semantic-Driven Topic Modeling Using Transformer-Based Embeddings and Clustering Algorithms [6.349503549199403]
This study introduces an innovative end-to-end semantic-driven topic modeling technique for the topic extraction process. Our model generates document embeddings using pre-trained transformer-based language models. Compared to ChatGPT and traditional topic modeling algorithms, our model provides more coherent and meaningful topics.
arXiv Detail & Related papers (2024-09-30T18:15:31Z)
Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities [89.40778301238642]
Model merging is an efficient empowerment technique in the machine learning community. There is a significant gap in the literature regarding a systematic and thorough review of these techniques.
arXiv Detail & Related papers (2024-08-14T16:58:48Z)
Interactive Topic Models with Optimal Transport [75.26555710661908]
We present EdTM, as an approach for label name supervised topic modeling. EdTM models topic modeling as an assignment problem while leveraging LM/LLM based document-topic affinities.
arXiv Detail & Related papers (2024-06-28T13:57:27Z)
Enhanced Short Text Modeling: Leveraging Large Language Models for Topic Refinement [7.6115889231452964]
We introduce a novel approach termed "Topic Refinement" This approach does not directly involve itself in the initial modeling of topics but focuses on improving topics after they have been mined. By employing prompt engineering, we direct LLMs to eliminate off-topic words within a given topic, ensuring that only contextually relevant words are preserved or substituted with ones that fit better semantically.
arXiv Detail & Related papers (2024-03-26T13:50:34Z)
Large Language Models Offer an Alternative to the Traditional Approach of Topic Modelling [0.9095496510579351]
We investigate the untapped potential of large language models (LLMs) as an alternative for uncovering the underlying topics within extensive text corpora. Our findings indicate that LLMs with appropriate prompts can stand out as a viable alternative, capable of generating relevant topic titles and adhering to human guidelines to refine and merge topics.
arXiv Detail & Related papers (2024-03-24T17:39:51Z)
Explore In-Context Segmentation via Latent Diffusion Models [132.26274147026854]
latent diffusion model (LDM) is an effective minimalist for in-context segmentation. We build a new and fair in-context segmentation benchmark that includes both image and video datasets.
arXiv Detail & Related papers (2024-03-14T17:52:31Z)
Large Language Models Are Latent Variable Models: Explaining and Finding Good Demonstrations for In-Context Learning [104.58874584354787]
In recent years, pre-trained large language models (LLMs) have demonstrated remarkable efficiency in achieving an inference-time few-shot learning capability known as in-context learning. This study aims to examine the in-context learning phenomenon through a Bayesian lens, viewing real-world LLMs as latent variable models.
arXiv Detail & Related papers (2023-01-27T18:59:01Z)
Knowledge-Aware Bayesian Deep Topic Model [50.58975785318575]
We propose a Bayesian generative model for incorporating prior domain knowledge into hierarchical topic modeling. Our proposed model efficiently integrates the prior knowledge and improves both hierarchical topic discovery and document representation.
arXiv Detail & Related papers (2022-09-20T09:16:05Z)
Topic Discovery via Latent Space Clustering of Pretrained Language Model Representations [35.74225306947918]
We propose a joint latent space learning and clustering framework built upon PLM embeddings. Our model effectively leverages the strong representation power and superb linguistic features brought by PLMs for topic discovery.
arXiv Detail & Related papers (2022-02-09T17:26:08Z)
How Far are We from Effective Context Modeling? An Exploratory Study on Semantic Parsing in Context [59.13515950353125]
We present a grammar-based decoding semantic parsing and adapt typical context modeling methods on top of it. We evaluate 13 context modeling methods on two large cross-domain datasets, and our best model achieves state-of-the-art performances.
arXiv Detail & Related papers (2020-02-03T11:28:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.