Neural Topic Modeling with Cycle-Consistent Adversarial Training
- URL: http://arxiv.org/abs/2009.13971v1
- Date: Tue, 29 Sep 2020 12:41:27 GMT
- Title: Neural Topic Modeling with Cycle-Consistent Adversarial Training
- Authors: Xuemeng Hu, Rui Wang, Deyu Zhou, Yuxuan Xiong
- Abstract summary: We propose Topic Modeling with Cycle-consistent Adversarial Training (ToMCAT) and its supervised version sToMCAT.
ToMCAT employs a generator network to interpret topics and an encoder network to infer document topics.
SToMCAT extends ToMCAT by incorporating document labels into the topic modeling process to help discover more coherent topics.
- Score: 17.47328718035538
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Advances on deep generative models have attracted significant research
interest in neural topic modeling. The recently proposed Adversarial-neural
Topic Model models topics with an adversarially trained generator network and
employs Dirichlet prior to capture the semantic patterns in latent topics. It
is effective in discovering coherent topics but unable to infer topic
distributions for given documents or utilize available document labels. To
overcome such limitations, we propose Topic Modeling with Cycle-consistent
Adversarial Training (ToMCAT) and its supervised version sToMCAT. ToMCAT
employs a generator network to interpret topics and an encoder network to infer
document topics. Adversarial training and cycle-consistent constraints are used
to encourage the generator and the encoder to produce realistic samples that
coordinate with each other. sToMCAT extends ToMCAT by incorporating document
labels into the topic modeling process to help discover more coherent topics.
The effectiveness of the proposed models is evaluated on
unsupervised/supervised topic modeling and text classification. The
experimental results show that our models can produce both coherent and
informative topics, outperforming a number of competitive baselines.
Related papers
- Semantic-Driven Topic Modeling Using Transformer-Based Embeddings and Clustering Algorithms [6.349503549199403]
This study introduces an innovative end-to-end semantic-driven topic modeling technique for the topic extraction process.
Our model generates document embeddings using pre-trained transformer-based language models.
Compared to ChatGPT and traditional topic modeling algorithms, our model provides more coherent and meaningful topics.
arXiv Detail & Related papers (2024-09-30T18:15:31Z) - Interactive Topic Models with Optimal Transport [75.26555710661908]
We present EdTM, as an approach for label name supervised topic modeling.
EdTM models topic modeling as an assignment problem while leveraging LM/LLM based document-topic affinities.
arXiv Detail & Related papers (2024-06-28T13:57:27Z) - GINopic: Topic Modeling with Graph Isomorphism Network [0.8962460460173959]
We introduce GINopic, a topic modeling framework based on graph isomorphism networks to capture the correlation between words.
We demonstrate the effectiveness of GINopic compared to existing topic models and highlight its potential for advancing topic modeling.
arXiv Detail & Related papers (2024-04-02T17:18:48Z) - Controllable Topic-Focused Abstractive Summarization [57.8015120583044]
Controlled abstractive summarization focuses on producing condensed versions of a source article to cover specific aspects.
This paper presents a new Transformer-based architecture capable of producing topic-focused summaries.
arXiv Detail & Related papers (2023-11-12T03:51:38Z) - Let the Pretrained Language Models "Imagine" for Short Texts Topic
Modeling [29.87929724277381]
In short texts, co-occurrence information is minimal, which results in feature sparsity in document representation.
Existing topic models (probabilistic or neural) mostly fail to mine patterns from them to generate coherent topics.
We extend short text into longer sequences using existing pre-trained language models (PLMs)
arXiv Detail & Related papers (2023-10-24T00:23:30Z) - Knowledge-Aware Bayesian Deep Topic Model [50.58975785318575]
We propose a Bayesian generative model for incorporating prior domain knowledge into hierarchical topic modeling.
Our proposed model efficiently integrates the prior knowledge and improves both hierarchical topic discovery and document representation.
arXiv Detail & Related papers (2022-09-20T09:16:05Z) - BERTopic: Neural topic modeling with a class-based TF-IDF procedure [0.0]
We present BERTopic, a topic model that extends the feasibility of approach topic modeling as a clustering task.
BERTopic generates coherent topics and remains competitive across a variety of benchmarks involving classical models and those that follow the more recent clustering approach of topic modeling.
arXiv Detail & Related papers (2022-03-11T08:35:15Z) - Topic Discovery via Latent Space Clustering of Pretrained Language Model
Representations [35.74225306947918]
We propose a joint latent space learning and clustering framework built upon PLM embeddings.
Our model effectively leverages the strong representation power and superb linguistic features brought by PLMs for topic discovery.
arXiv Detail & Related papers (2022-02-09T17:26:08Z) - Improving Neural Topic Models using Knowledge Distillation [84.66983329587073]
We use knowledge distillation to combine the best attributes of probabilistic topic models and pretrained transformers.
Our modular method can be straightforwardly applied with any neural topic model to improve topic quality.
arXiv Detail & Related papers (2020-10-05T22:49:16Z) - Topic Adaptation and Prototype Encoding for Few-Shot Visual Storytelling [81.33107307509718]
We propose a topic adaptive storyteller to model the ability of inter-topic generalization.
We also propose a prototype encoding structure to model the ability of intra-topic derivation.
Experimental results show that topic adaptation and prototype encoding structure mutually bring benefit to the few-shot model.
arXiv Detail & Related papers (2020-08-11T03:55:11Z) - Deep Autoencoding Topic Model with Scalable Hybrid Bayesian Inference [55.35176938713946]
We develop deep autoencoding topic model (DATM) that uses a hierarchy of gamma distributions to construct its multi-stochastic-layer generative network.
We propose a Weibull upward-downward variational encoder that deterministically propagates information upward via a deep neural network, followed by a downward generative model.
The efficacy and scalability of our models are demonstrated on both unsupervised and supervised learning tasks on big corpora.
arXiv Detail & Related papers (2020-06-15T22:22:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.