Knowledge-Aware Bayesian Deep Topic Model
- URL: http://arxiv.org/abs/2209.14228v1
- Date: Tue, 20 Sep 2022 09:16:05 GMT
- Title: Knowledge-Aware Bayesian Deep Topic Model
- Authors: Dongsheng Wang, Yishi Xu, Miaoge Li, Zhibin Duan, Chaojie Wang, Bo
Chen and Mingyuan Zhou
- Abstract summary: We propose a Bayesian generative model for incorporating prior domain knowledge into hierarchical topic modeling.
Our proposed model efficiently integrates the prior knowledge and improves both hierarchical topic discovery and document representation.
- Score: 50.58975785318575
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a Bayesian generative model for incorporating prior domain
knowledge into hierarchical topic modeling. Although embedded topic models
(ETMs) and its variants have gained promising performance in text analysis,
they mainly focus on mining word co-occurrence patterns, ignoring potentially
easy-to-obtain prior topic hierarchies that could help enhance topic coherence.
While several knowledge-based topic models have recently been proposed, they
are either only applicable to shallow hierarchies or sensitive to the quality
of the provided prior knowledge. To this end, we develop a novel deep ETM that
jointly models the documents and the given prior knowledge by embedding the
words and topics into the same space. Guided by the provided knowledge, the
proposed model tends to discover topic hierarchies that are organized into
interpretable taxonomies. Besides, with a technique for adapting a given graph,
our extended version allows the provided prior topic structure to be finetuned
to match the target corpus. Extensive experiments show that our proposed model
efficiently integrates the prior knowledge and improves both hierarchical topic
discovery and document representation.
Related papers
- Semantic-Driven Topic Modeling Using Transformer-Based Embeddings and Clustering Algorithms [6.349503549199403]
This study introduces an innovative end-to-end semantic-driven topic modeling technique for the topic extraction process.
Our model generates document embeddings using pre-trained transformer-based language models.
Compared to ChatGPT and traditional topic modeling algorithms, our model provides more coherent and meaningful topics.
arXiv Detail & Related papers (2024-09-30T18:15:31Z) - Interactive Topic Models with Optimal Transport [75.26555710661908]
We present EdTM, as an approach for label name supervised topic modeling.
EdTM models topic modeling as an assignment problem while leveraging LM/LLM based document-topic affinities.
arXiv Detail & Related papers (2024-06-28T13:57:27Z) - Enhanced Short Text Modeling: Leveraging Large Language Models for Topic Refinement [7.6115889231452964]
We introduce a novel approach termed "Topic Refinement"
This approach does not directly involve itself in the initial modeling of topics but focuses on improving topics after they have been mined.
By employing prompt engineering, we direct LLMs to eliminate off-topic words within a given topic, ensuring that only contextually relevant words are preserved or substituted with ones that fit better semantically.
arXiv Detail & Related papers (2024-03-26T13:50:34Z) - HyperMiner: Topic Taxonomy Mining with Hyperbolic Embedding [54.52651110749165]
We present a novel framework that introduces hyperbolic embeddings to represent words and topics.
With the tree-likeness property of hyperbolic space, the underlying semantic hierarchy can be better exploited to mine more interpretable topics.
arXiv Detail & Related papers (2022-10-16T02:54:17Z) - Topic Discovery via Latent Space Clustering of Pretrained Language Model
Representations [35.74225306947918]
We propose a joint latent space learning and clustering framework built upon PLM embeddings.
Our model effectively leverages the strong representation power and superb linguistic features brought by PLMs for topic discovery.
arXiv Detail & Related papers (2022-02-09T17:26:08Z) - Keyword Assisted Embedded Topic Model [1.9000421840914223]
Probabilistic topic models describe how words in documents are generated via a set of latent distributions called topics.
Recently, the Embedded Topic Model (ETM) has extended LDA to utilize the semantic information in word embeddings to derive semantically richer topics.
We propose the Keyword Assisted Embedded Topic Model (KeyETM), which equips ETM with the ability to incorporate user knowledge in the form of informative topic-level priors.
arXiv Detail & Related papers (2021-11-22T07:27:17Z) - TopicNet: Semantic Graph-Guided Topic Discovery [51.71374479354178]
Existing deep hierarchical topic models are able to extract semantically meaningful topics from a text corpus in an unsupervised manner.
We introduce TopicNet as a deep hierarchical topic model that can inject prior structural knowledge as an inductive bias to influence learning.
arXiv Detail & Related papers (2021-10-27T09:07:14Z) - Improving Neural Topic Models using Knowledge Distillation [84.66983329587073]
We use knowledge distillation to combine the best attributes of probabilistic topic models and pretrained transformers.
Our modular method can be straightforwardly applied with any neural topic model to improve topic quality.
arXiv Detail & Related papers (2020-10-05T22:49:16Z) - Topic Adaptation and Prototype Encoding for Few-Shot Visual Storytelling [81.33107307509718]
We propose a topic adaptive storyteller to model the ability of inter-topic generalization.
We also propose a prototype encoding structure to model the ability of intra-topic derivation.
Experimental results show that topic adaptation and prototype encoding structure mutually bring benefit to the few-shot model.
arXiv Detail & Related papers (2020-08-11T03:55:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.