Keyword Assisted Embedded Topic Model
- URL: http://arxiv.org/abs/2112.03101v1
- Date: Mon, 22 Nov 2021 07:27:17 GMT
- Title: Keyword Assisted Embedded Topic Model
- Authors: Bahareh Harandizadeh, J. Hunter Priniski, Fred Morstatter
- Abstract summary: Probabilistic topic models describe how words in documents are generated via a set of latent distributions called topics.
Recently, the Embedded Topic Model (ETM) has extended LDA to utilize the semantic information in word embeddings to derive semantically richer topics.
We propose the Keyword Assisted Embedded Topic Model (KeyETM), which equips ETM with the ability to incorporate user knowledge in the form of informative topic-level priors.
- Score: 1.9000421840914223
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: By illuminating latent structures in a corpus of text, topic models are an
essential tool for categorizing, summarizing, and exploring large collections
of documents. Probabilistic topic models, such as latent Dirichlet allocation
(LDA), describe how words in documents are generated via a set of latent
distributions called topics. Recently, the Embedded Topic Model (ETM) has
extended LDA to utilize the semantic information in word embeddings to derive
semantically richer topics. As LDA and its extensions are unsupervised models,
they aren't defined to make efficient use of a user's prior knowledge of the
domain. To this end, we propose the Keyword Assisted Embedded Topic Model
(KeyETM), which equips ETM with the ability to incorporate user knowledge in
the form of informative topic-level priors over the vocabulary. Using both
quantitative metrics and human responses on a topic intrusion task, we
demonstrate that KeyETM produces better topics than other guided, generative
models in the literature.
Related papers
- Interactive Topic Models with Optimal Transport [75.26555710661908]
We present EdTM, as an approach for label name supervised topic modeling.
EdTM models topic modeling as an assignment problem while leveraging LM/LLM based document-topic affinities.
arXiv Detail & Related papers (2024-06-28T13:57:27Z) - Large Language Models Offer an Alternative to the Traditional Approach of Topic Modelling [0.9095496510579351]
We investigate the untapped potential of large language models (LLMs) as an alternative for uncovering the underlying topics within extensive text corpora.
Our findings indicate that LLMs with appropriate prompts can stand out as a viable alternative, capable of generating relevant topic titles and adhering to human guidelines to refine and merge topics.
arXiv Detail & Related papers (2024-03-24T17:39:51Z) - Improving Retrieval in Theme-specific Applications using a Corpus
Topical Taxonomy [52.426623750562335]
We introduce ToTER (Topical taxonomy Enhanced Retrieval) framework.
ToTER identifies the central topics of queries and documents with the guidance of the taxonomy, and exploits their topical relatedness to supplement missing contexts.
As a plug-and-play framework, ToTER can be flexibly employed to enhance various PLM-based retrievers.
arXiv Detail & Related papers (2024-03-07T02:34:54Z) - Prompting Large Language Models for Topic Modeling [10.31712610860913]
We propose PromptTopic, a novel topic modeling approach that harnesses the advanced language understanding of large language models (LLMs)
It involves extracting topics at the sentence level from individual documents, then aggregating and condensing these topics into a predefined quantity, ultimately providing coherent topics for texts of varying lengths.
We benchmark PromptTopic against the state-of-the-art baselines on three vastly diverse datasets, establishing its proficiency in discovering meaningful topics.
arXiv Detail & Related papers (2023-12-15T11:15:05Z) - TopicGPT: A Prompt-based Topic Modeling Framework [77.72072691307811]
We introduce TopicGPT, a prompt-based framework that uses large language models to uncover latent topics in a text collection.
It produces topics that align better with human categorizations compared to competing methods.
Its topics are also interpretable, dispensing with ambiguous bags of words in favor of topics with natural language labels and associated free-form descriptions.
arXiv Detail & Related papers (2023-11-02T17:57:10Z) - Moving beyond word lists: towards abstractive topic labels for
human-like topics of scientific documents [0.0]
We present an approach to generating human-like topic labels using abstractive multi-document summarization (MDS)
We model topics in citation sentences in order to understand what further research needs to be done to fully operationalize MDS for topic labeling.
arXiv Detail & Related papers (2022-10-28T17:47:12Z) - HyperMiner: Topic Taxonomy Mining with Hyperbolic Embedding [54.52651110749165]
We present a novel framework that introduces hyperbolic embeddings to represent words and topics.
With the tree-likeness property of hyperbolic space, the underlying semantic hierarchy can be better exploited to mine more interpretable topics.
arXiv Detail & Related papers (2022-10-16T02:54:17Z) - Knowledge-Aware Bayesian Deep Topic Model [50.58975785318575]
We propose a Bayesian generative model for incorporating prior domain knowledge into hierarchical topic modeling.
Our proposed model efficiently integrates the prior knowledge and improves both hierarchical topic discovery and document representation.
arXiv Detail & Related papers (2022-09-20T09:16:05Z) - Topic Discovery via Latent Space Clustering of Pretrained Language Model
Representations [35.74225306947918]
We propose a joint latent space learning and clustering framework built upon PLM embeddings.
Our model effectively leverages the strong representation power and superb linguistic features brought by PLMs for topic discovery.
arXiv Detail & Related papers (2022-02-09T17:26:08Z) - TopicNet: Semantic Graph-Guided Topic Discovery [51.71374479354178]
Existing deep hierarchical topic models are able to extract semantically meaningful topics from a text corpus in an unsupervised manner.
We introduce TopicNet as a deep hierarchical topic model that can inject prior structural knowledge as an inductive bias to influence learning.
arXiv Detail & Related papers (2021-10-27T09:07:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.