Effective Neural Topic Modeling with Embedding Clustering Regularization
- URL: http://arxiv.org/abs/2306.04217v1
- Date: Wed, 7 Jun 2023 07:45:38 GMT
- Title: Effective Neural Topic Modeling with Embedding Clustering Regularization
- Authors: Xiaobao Wu, Xinshuai Dong, Thong Nguyen, Anh Tuan Luu
- Abstract summary: We propose a new neural topic model, Embedding Clustering Regularization Topic Model (ECRTM)
ECRTM forces each topic embedding to be the center of a separately aggregated word embedding cluster in the semantic space.
Our ECRTM generates diverse and coherent topics together with high-quality topic distributions of documents.
- Score: 21.692088899479934
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Topic models have been prevalent for decades with various applications.
However, existing topic models commonly suffer from the notorious topic
collapsing: discovered topics semantically collapse towards each other, leading
to highly repetitive topics, insufficient topic discovery, and damaged model
interpretability. In this paper, we propose a new neural topic model, Embedding
Clustering Regularization Topic Model (ECRTM). Besides the existing
reconstruction error, we propose a novel Embedding Clustering Regularization
(ECR), which forces each topic embedding to be the center of a separately
aggregated word embedding cluster in the semantic space. This enables each
produced topic to contain distinct word semantics, which alleviates topic
collapsing. Regularized by ECR, our ECRTM generates diverse and coherent topics
together with high-quality topic distributions of documents. Extensive
experiments on benchmark datasets demonstrate that ECRTM effectively addresses
the topic collapsing issue and consistently surpasses state-of-the-art
baselines in terms of topic quality, topic distributions of documents, and
downstream classification tasks.
Related papers
- Enhanced Short Text Modeling: Leveraging Large Language Models for Topic Refinement [7.6115889231452964]
We introduce a novel approach termed "Topic Refinement"
This approach does not directly involve itself in the initial modeling of topics but focuses on improving topics after they have been mined.
By employing prompt engineering, we direct LLMs to eliminate off-topic words within a given topic, ensuring that only contextually relevant words are preserved or substituted with ones that fit better semantically.
arXiv Detail & Related papers (2024-03-26T13:50:34Z) - Topic Taxonomy Expansion via Hierarchy-Aware Topic Phrase Generation [58.3921103230647]
We propose a novel framework for topic taxonomy expansion, named TopicExpan.
TopicExpan directly generates topic-related terms belonging to new topics.
Experimental results on two real-world text corpora show that TopicExpan significantly outperforms other baseline methods in terms of the quality of output.
arXiv Detail & Related papers (2022-10-18T22:38:49Z) - Knowledge-Aware Bayesian Deep Topic Model [50.58975785318575]
We propose a Bayesian generative model for incorporating prior domain knowledge into hierarchical topic modeling.
Our proposed model efficiently integrates the prior knowledge and improves both hierarchical topic discovery and document representation.
arXiv Detail & Related papers (2022-09-20T09:16:05Z) - Topic Discovery via Latent Space Clustering of Pretrained Language Model
Representations [35.74225306947918]
We propose a joint latent space learning and clustering framework built upon PLM embeddings.
Our model effectively leverages the strong representation power and superb linguistic features brought by PLMs for topic discovery.
arXiv Detail & Related papers (2022-02-09T17:26:08Z) - TaxoCom: Topic Taxonomy Completion with Hierarchical Discovery of Novel
Topic Clusters [57.59286394188025]
We propose a novel framework for topic taxonomy completion, named TaxoCom.
TaxoCom discovers novel sub-topic clusters of terms and documents.
Our comprehensive experiments on two real-world datasets demonstrate that TaxoCom not only generates the high-quality topic taxonomy in terms of term coherency and topic coverage.
arXiv Detail & Related papers (2022-01-18T07:07:38Z) - TopicNet: Semantic Graph-Guided Topic Discovery [51.71374479354178]
Existing deep hierarchical topic models are able to extract semantically meaningful topics from a text corpus in an unsupervised manner.
We introduce TopicNet as a deep hierarchical topic model that can inject prior structural knowledge as an inductive bias to influence learning.
arXiv Detail & Related papers (2021-10-27T09:07:14Z) - Author Clustering and Topic Estimation for Short Texts [69.54017251622211]
We propose a novel model that expands on the Latent Dirichlet Allocation by modeling strong dependence among the words in the same document.
We also simultaneously cluster users, removing the need for post-hoc cluster estimation.
Our method performs as well as -- or better -- than traditional approaches to problems arising in short text.
arXiv Detail & Related papers (2021-06-15T20:55:55Z) - Neural Topic Modeling with Cycle-Consistent Adversarial Training [17.47328718035538]
We propose Topic Modeling with Cycle-consistent Adversarial Training (ToMCAT) and its supervised version sToMCAT.
ToMCAT employs a generator network to interpret topics and an encoder network to infer document topics.
SToMCAT extends ToMCAT by incorporating document labels into the topic modeling process to help discover more coherent topics.
arXiv Detail & Related papers (2020-09-29T12:41:27Z) - Topic-Aware Multi-turn Dialogue Modeling [91.52820664879432]
This paper presents a novel solution for multi-turn dialogue modeling, which segments and extracts topic-aware utterances in an unsupervised way.
Our topic-aware modeling is implemented by a newly proposed unsupervised topic-aware segmentation algorithm and Topic-Aware Dual-attention Matching (TADAM) Network.
arXiv Detail & Related papers (2020-09-26T08:43:06Z) - Context Reinforced Neural Topic Modeling over Short Texts [15.487822291146689]
We propose a Context Reinforced Neural Topic Model (CRNTM)
CRNTM infers the topic for each word in a narrow range by assuming that each short text covers only a few salient topics.
Experiments on two benchmark datasets validate the effectiveness of the proposed model on both topic discovery and text classification.
arXiv Detail & Related papers (2020-08-11T06:41:53Z) - Tired of Topic Models? Clusters of Pretrained Word Embeddings Make for
Fast and Good Topics too! [5.819224524813161]
We propose an alternative way to obtain topics: clustering pre-trained word embeddings while incorporating document information for weighted clustering and reranking top words.
The best performing combination for our approach performs as well as classical topic models, but with lower runtime and computational complexity.
arXiv Detail & Related papers (2020-04-30T16:18:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.