Keyword Assisted Topic Models
- URL: http://arxiv.org/abs/2004.05964v2
- Date: Wed, 10 Mar 2021 15:24:52 GMT
- Title: Keyword Assisted Topic Models
- Authors: Shusei Eshima, Kosuke Imai and Tomoya Sasaki
- Abstract summary: We show that providing a small number of keywords can substantially enhance the measurement performance of topic models.
KeyATM provides more interpretable results, has better document classification performance, and is less sensitive to the number of topics than the standard topic models.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, fully automated content analysis based on probabilistic
topic models has become popular among social scientists because of their
scalability. The unsupervised nature of the models makes them suitable for
exploring topics in a corpus without prior knowledge. However, researchers find
that these models often fail to measure specific concepts of substantive
interest by inadvertently creating multiple topics with similar content and
combining distinct themes into a single topic. In this paper, we empirically
demonstrate that providing a small number of keywords can substantially enhance
the measurement performance of topic models. An important advantage of the
proposed keyword assisted topic model (keyATM) is that the specification of
keywords requires researchers to label topics prior to fitting a model to the
data. This contrasts with a widespread practice of post-hoc topic
interpretation and adjustments that compromises the objectivity of empirical
findings. In our application, we find that keyATM provides more interpretable
results, has better document classification performance, and is less sensitive
to the number of topics than the standard topic models. Finally, we show that
keyATM can also incorporate covariates and model time trends. An open-source
software package is available for implementing the proposed methodology.
Related papers
- Investigating the Impact of Text Summarization on Topic Modeling [13.581341206178525]
In this paper, an approach is proposed that further enhances topic modeling performance by utilizing a pre-trained large language model (LLM)
Few shot prompting is used to generate summaries of different lengths to compare their impact on topic modeling.
The proposed method yields better topic diversity and comparable coherence values compared to previous models.
arXiv Detail & Related papers (2024-09-28T19:45:45Z) - Interactive Topic Models with Optimal Transport [75.26555710661908]
We present EdTM, as an approach for label name supervised topic modeling.
EdTM models topic modeling as an assignment problem while leveraging LM/LLM based document-topic affinities.
arXiv Detail & Related papers (2024-06-28T13:57:27Z) - Topics in the Haystack: Extracting and Evaluating Topics beyond
Coherence [0.0]
We propose a method that incorporates a deeper understanding of both sentence and document themes.
This allows our model to detect latent topics that may include uncommon words or neologisms.
We present correlation coefficients with human identification of intruder words and achieve near-human level results at the word-intrusion task.
arXiv Detail & Related papers (2023-03-30T12:24:25Z) - Knowledge-Aware Bayesian Deep Topic Model [50.58975785318575]
We propose a Bayesian generative model for incorporating prior domain knowledge into hierarchical topic modeling.
Our proposed model efficiently integrates the prior knowledge and improves both hierarchical topic discovery and document representation.
arXiv Detail & Related papers (2022-09-20T09:16:05Z) - Enhance Topics Analysis based on Keywords Properties [0.0]
We present a specificity score based on keywords properties that is able to select the most informative topics.
In the experiments, we show that we are able to compress the state-of-the-art topic modelling results of different factors with an information loss that is much lower than the solution based on the recent coherence score presented in literature.
arXiv Detail & Related papers (2022-03-09T15:10:12Z) - Topic Discovery via Latent Space Clustering of Pretrained Language Model
Representations [35.74225306947918]
We propose a joint latent space learning and clustering framework built upon PLM embeddings.
Our model effectively leverages the strong representation power and superb linguistic features brought by PLMs for topic discovery.
arXiv Detail & Related papers (2022-02-09T17:26:08Z) - TopicNet: Semantic Graph-Guided Topic Discovery [51.71374479354178]
Existing deep hierarchical topic models are able to extract semantically meaningful topics from a text corpus in an unsupervised manner.
We introduce TopicNet as a deep hierarchical topic model that can inject prior structural knowledge as an inductive bias to influence learning.
arXiv Detail & Related papers (2021-10-27T09:07:14Z) - Author Clustering and Topic Estimation for Short Texts [69.54017251622211]
We propose a novel model that expands on the Latent Dirichlet Allocation by modeling strong dependence among the words in the same document.
We also simultaneously cluster users, removing the need for post-hoc cluster estimation.
Our method performs as well as -- or better -- than traditional approaches to problems arising in short text.
arXiv Detail & Related papers (2021-06-15T20:55:55Z) - ConvoSumm: Conversation Summarization Benchmark and Improved Abstractive
Summarization with Argument Mining [61.82562838486632]
We crowdsource four new datasets on diverse online conversation forms of news comments, discussion forums, community question answering forums, and email threads.
We benchmark state-of-the-art models on our datasets and analyze characteristics associated with the data.
arXiv Detail & Related papers (2021-06-01T22:17:13Z) - Query-Driven Topic Model [23.07260625816975]
One desirable property of topic models is to allow users to find topics describing a specific aspect of the corpus.
We propose a novel query-driven topic model that allows users to specify a simple query in words or phrases and return query-related topics.
arXiv Detail & Related papers (2021-05-28T22:49:42Z) - Improving Neural Topic Models using Knowledge Distillation [84.66983329587073]
We use knowledge distillation to combine the best attributes of probabilistic topic models and pretrained transformers.
Our modular method can be straightforwardly applied with any neural topic model to improve topic quality.
arXiv Detail & Related papers (2020-10-05T22:49:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.