BERTopic: Neural topic modeling with a class-based TF-IDF procedure
- URL: http://arxiv.org/abs/2203.05794v1
- Date: Fri, 11 Mar 2022 08:35:15 GMT
- Title: BERTopic: Neural topic modeling with a class-based TF-IDF procedure
- Authors: Maarten Grootendorst
- Abstract summary: We present BERTopic, a topic model that extends the feasibility of approach topic modeling as a clustering task.
BERTopic generates coherent topics and remains competitive across a variety of benchmarks involving classical models and those that follow the more recent clustering approach of topic modeling.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Topic models can be useful tools to discover latent topics in collections of
documents. Recent studies have shown the feasibility of approach topic modeling
as a clustering task. We present BERTopic, a topic model that extends this
process by extracting coherent topic representation through the development of
a class-based variation of TF-IDF. More specifically, BERTopic generates
document embedding with pre-trained transformer-based language models, clusters
these embeddings, and finally, generates topic representations with the
class-based TF-IDF procedure. BERTopic generates coherent topics and remains
competitive across a variety of benchmarks involving classical models and those
that follow the more recent clustering approach of topic modeling.
Related papers
- Semantic-Driven Topic Modeling Using Transformer-Based Embeddings and Clustering Algorithms [6.349503549199403]
This study introduces an innovative end-to-end semantic-driven topic modeling technique for the topic extraction process.
Our model generates document embeddings using pre-trained transformer-based language models.
Compared to ChatGPT and traditional topic modeling algorithms, our model provides more coherent and meaningful topics.
arXiv Detail & Related papers (2024-09-30T18:15:31Z) - Iterative Improvement of an Additively Regularized Topic Model [0.0]
We present a method for iterative training of a topic model.
Experiments conducted on several collections of natural language texts show that the proposed ITAR model performs better than other popular topic models.
arXiv Detail & Related papers (2024-08-11T18:22:12Z) - Interactive Topic Models with Optimal Transport [75.26555710661908]
We present EdTM, as an approach for label name supervised topic modeling.
EdTM models topic modeling as an assignment problem while leveraging LM/LLM based document-topic affinities.
arXiv Detail & Related papers (2024-06-28T13:57:27Z) - Reinforcing Pre-trained Models Using Counterfactual Images [54.26310919385808]
This paper proposes a novel framework to reinforce classification models using language-guided generated counterfactual images.
We identify model weaknesses by testing the model using the counterfactual image dataset.
We employ the counterfactual images as an augmented dataset to fine-tune and reinforce the classification model.
arXiv Detail & Related papers (2024-06-19T08:07:14Z) - GINopic: Topic Modeling with Graph Isomorphism Network [0.8962460460173959]
We introduce GINopic, a topic modeling framework based on graph isomorphism networks to capture the correlation between words.
We demonstrate the effectiveness of GINopic compared to existing topic models and highlight its potential for advancing topic modeling.
arXiv Detail & Related papers (2024-04-02T17:18:48Z) - Controllable Topic-Focused Abstractive Summarization [57.8015120583044]
Controlled abstractive summarization focuses on producing condensed versions of a source article to cover specific aspects.
This paper presents a new Transformer-based architecture capable of producing topic-focused summaries.
arXiv Detail & Related papers (2023-11-12T03:51:38Z) - Federated Neural Topic Models [0.0]
Federated topic modeling allows multiple parties to jointly train a topic model without sharing their data.
We propose and analyze a federated implementation based on state-of-the-art neural topic modeling implementations.
In practice, our approach is equivalent to a centralized model training, but preserves the privacy of the nodes.
arXiv Detail & Related papers (2022-12-05T13:49:26Z) - Knowledge-Aware Bayesian Deep Topic Model [50.58975785318575]
We propose a Bayesian generative model for incorporating prior domain knowledge into hierarchical topic modeling.
Our proposed model efficiently integrates the prior knowledge and improves both hierarchical topic discovery and document representation.
arXiv Detail & Related papers (2022-09-20T09:16:05Z) - Improving Neural Topic Models using Knowledge Distillation [84.66983329587073]
We use knowledge distillation to combine the best attributes of probabilistic topic models and pretrained transformers.
Our modular method can be straightforwardly applied with any neural topic model to improve topic quality.
arXiv Detail & Related papers (2020-10-05T22:49:16Z) - Neural Topic Modeling with Cycle-Consistent Adversarial Training [17.47328718035538]
We propose Topic Modeling with Cycle-consistent Adversarial Training (ToMCAT) and its supervised version sToMCAT.
ToMCAT employs a generator network to interpret topics and an encoder network to infer document topics.
SToMCAT extends ToMCAT by incorporating document labels into the topic modeling process to help discover more coherent topics.
arXiv Detail & Related papers (2020-09-29T12:41:27Z) - Topic Adaptation and Prototype Encoding for Few-Shot Visual Storytelling [81.33107307509718]
We propose a topic adaptive storyteller to model the ability of inter-topic generalization.
We also propose a prototype encoding structure to model the ability of intra-topic derivation.
Experimental results show that topic adaptation and prototype encoding structure mutually bring benefit to the few-shot model.
arXiv Detail & Related papers (2020-08-11T03:55:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.