Investigating the Impact of Text Summarization on Topic Modeling
- URL: http://arxiv.org/abs/2410.09063v1
- Date: Sat, 28 Sep 2024 19:45:45 GMT
- Title: Investigating the Impact of Text Summarization on Topic Modeling
- Authors: Trishia Khandelwal,
- Abstract summary: In this paper, an approach is proposed that further enhances topic modeling performance by utilizing a pre-trained large language model (LLM)
Few shot prompting is used to generate summaries of different lengths to compare their impact on topic modeling.
The proposed method yields better topic diversity and comparable coherence values compared to previous models.
- Score: 13.581341206178525
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Topic models are used to identify and group similar themes in a set of documents. Recent advancements in deep learning based neural topic models has received significant research interest. In this paper, an approach is proposed that further enhances topic modeling performance by utilizing a pre-trained large language model (LLM) to generate summaries of documents before inputting them into the topic model. Few shot prompting is used to generate summaries of different lengths to compare their impact on topic modeling. This approach is particularly effective for larger documents because it helps capture the most essential information while reducing noise and irrelevant details that could obscure the overall theme. Additionally, it is observed that datasets exhibit an optimal summary length that leads to improved topic modeling performance. The proposed method yields better topic diversity and comparable coherence values compared to previous models.
Related papers
- Enhancing Short-Text Topic Modeling with LLM-Driven Context Expansion and Prefix-Tuned VAEs [25.915607750636333]
We propose a novel approach that leverages large language models (LLMs) to extend short texts into more detailed sequences before applying topic modeling.
Our method significantly improves short-text topic modeling performance, as demonstrated by extensive experiments on real-world datasets with extreme data sparsity.
arXiv Detail & Related papers (2024-10-04T01:28:56Z) - Iterative Improvement of an Additively Regularized Topic Model [0.0]
We present a method for iterative training of a topic model.
Experiments conducted on several collections of natural language texts show that the proposed ITAR model performs better than other popular topic models.
arXiv Detail & Related papers (2024-08-11T18:22:12Z) - An Iterative Approach to Topic Modelling [0.0]
We propose to use an iterative process to perform topic modelling that gives rise to a sense of completeness of the resulting topics when the process is complete.
We demonstrate how the modelling process can be applied iteratively to arrive at a set of topics that could not be further improved upon using one of the three selected measures for clustering comparison.
arXiv Detail & Related papers (2024-07-25T09:26:07Z) - GINopic: Topic Modeling with Graph Isomorphism Network [0.8962460460173959]
We introduce GINopic, a topic modeling framework based on graph isomorphism networks to capture the correlation between words.
We demonstrate the effectiveness of GINopic compared to existing topic models and highlight its potential for advancing topic modeling.
arXiv Detail & Related papers (2024-04-02T17:18:48Z) - Let the Pretrained Language Models "Imagine" for Short Texts Topic
Modeling [29.87929724277381]
In short texts, co-occurrence information is minimal, which results in feature sparsity in document representation.
Existing topic models (probabilistic or neural) mostly fail to mine patterns from them to generate coherent topics.
We extend short text into longer sequences using existing pre-trained language models (PLMs)
arXiv Detail & Related papers (2023-10-24T00:23:30Z) - Knowledge-Aware Bayesian Deep Topic Model [50.58975785318575]
We propose a Bayesian generative model for incorporating prior domain knowledge into hierarchical topic modeling.
Our proposed model efficiently integrates the prior knowledge and improves both hierarchical topic discovery and document representation.
arXiv Detail & Related papers (2022-09-20T09:16:05Z) - An Empirical Investigation of Commonsense Self-Supervision with
Knowledge Graphs [67.23285413610243]
Self-supervision based on the information extracted from large knowledge graphs has been shown to improve the generalization of language models.
We study the effect of knowledge sampling strategies and sizes that can be used to generate synthetic data for adapting language models.
arXiv Detail & Related papers (2022-05-21T19:49:04Z) - Temporal Relevance Analysis for Video Action Models [70.39411261685963]
We first propose a new approach to quantify the temporal relationships between frames captured by CNN-based action models.
We then conduct comprehensive experiments and in-depth analysis to provide a better understanding of how temporal modeling is affected.
arXiv Detail & Related papers (2022-04-25T19:06:48Z) - ConvoSumm: Conversation Summarization Benchmark and Improved Abstractive
Summarization with Argument Mining [61.82562838486632]
We crowdsource four new datasets on diverse online conversation forms of news comments, discussion forums, community question answering forums, and email threads.
We benchmark state-of-the-art models on our datasets and analyze characteristics associated with the data.
arXiv Detail & Related papers (2021-06-01T22:17:13Z) - Improving Neural Topic Models using Knowledge Distillation [84.66983329587073]
We use knowledge distillation to combine the best attributes of probabilistic topic models and pretrained transformers.
Our modular method can be straightforwardly applied with any neural topic model to improve topic quality.
arXiv Detail & Related papers (2020-10-05T22:49:16Z) - Topic Adaptation and Prototype Encoding for Few-Shot Visual Storytelling [81.33107307509718]
We propose a topic adaptive storyteller to model the ability of inter-topic generalization.
We also propose a prototype encoding structure to model the ability of intra-topic derivation.
Experimental results show that topic adaptation and prototype encoding structure mutually bring benefit to the few-shot model.
arXiv Detail & Related papers (2020-08-11T03:55:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.