The Geometric Structure of Topic Models
- URL: http://arxiv.org/abs/2403.03607v1
- Date: Wed, 6 Mar 2024 10:53:51 GMT
- Title: The Geometric Structure of Topic Models
- Authors: Johannes Hirth, Tom Hanika
- Abstract summary: Despite their widespread use in research and application, an in-depth analysis of topic models is still an open research topic.
We propose an incidence-geometric method for deriving an ordinal structure from flat topic models.
We present a new visualization paradigm for concept hierarchies based on ordinal motifs.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Topic models are a popular tool for clustering and analyzing textual data.
They allow texts to be classified on the basis of their affiliation to the
previously calculated topics. Despite their widespread use in research and
application, an in-depth analysis of topic models is still an open research
topic. State-of-the-art methods for interpreting topic models are based on
simple visualizations, such as similarity matrices, top-term lists or
embeddings, which are limited to a maximum of three dimensions. In this paper,
we propose an incidence-geometric method for deriving an ordinal structure from
flat topic models, such as non-negative matrix factorization. These enable the
analysis of the topic model in a higher (order) dimension and the possibility
of extracting conceptual relationships between several topics at once. Due to
the use of conceptual scaling, our approach does not introduce any artificial
topical relationships, such as artifacts of feature compression. Based on our
findings, we present a new visualization paradigm for concept hierarchies based
on ordinal motifs. These allow for a top-down view on topic spaces. We
introduce and demonstrate the applicability of our approach based on a topic
model derived from a corpus of scientific papers taken from 32 top machine
learning venues.
Related papers
- Semantic-Driven Topic Modeling Using Transformer-Based Embeddings and Clustering Algorithms [6.349503549199403]
This study introduces an innovative end-to-end semantic-driven topic modeling technique for the topic extraction process.
Our model generates document embeddings using pre-trained transformer-based language models.
Compared to ChatGPT and traditional topic modeling algorithms, our model provides more coherent and meaningful topics.
arXiv Detail & Related papers (2024-09-30T18:15:31Z) - Interactive Topic Models with Optimal Transport [75.26555710661908]
We present EdTM, as an approach for label name supervised topic modeling.
EdTM models topic modeling as an assignment problem while leveraging LM/LLM based document-topic affinities.
arXiv Detail & Related papers (2024-06-28T13:57:27Z) - GINopic: Topic Modeling with Graph Isomorphism Network [0.8962460460173959]
We introduce GINopic, a topic modeling framework based on graph isomorphism networks to capture the correlation between words.
We demonstrate the effectiveness of GINopic compared to existing topic models and highlight its potential for advancing topic modeling.
arXiv Detail & Related papers (2024-04-02T17:18:48Z) - Knowledge-Aware Bayesian Deep Topic Model [50.58975785318575]
We propose a Bayesian generative model for incorporating prior domain knowledge into hierarchical topic modeling.
Our proposed model efficiently integrates the prior knowledge and improves both hierarchical topic discovery and document representation.
arXiv Detail & Related papers (2022-09-20T09:16:05Z) - TopicNet: Semantic Graph-Guided Topic Discovery [51.71374479354178]
Existing deep hierarchical topic models are able to extract semantically meaningful topics from a text corpus in an unsupervised manner.
We introduce TopicNet as a deep hierarchical topic model that can inject prior structural knowledge as an inductive bias to influence learning.
arXiv Detail & Related papers (2021-10-27T09:07:14Z) - Learning Topic Models: Identifiability and Finite-Sample Analysis [6.181048261489101]
We propose a maximum likelihood estimator (MLE) of latent topics based on a specific integrated likelihood.
We conclude with empirical studies on both simulated and real datasets.
arXiv Detail & Related papers (2021-10-08T16:35:42Z) - Semiparametric Latent Topic Modeling on Consumer-Generated Corpora [0.0]
This paper proposes semiparametric topic model, a two-step approach utilizing nonnegative matrix factorization and semiparametric regression in topic modeling.
The model enables the reconstruction of sparse topic structures in the corpus and provides a generative model for predicting topics in new documents entering the corpus.
In an actual consumer feedback corpus, the model also demonstrably provides interpretable and useful topic definitions comparable with those produced by other methods.
arXiv Detail & Related papers (2021-07-13T00:22:02Z) - Improving Neural Topic Models using Knowledge Distillation [84.66983329587073]
We use knowledge distillation to combine the best attributes of probabilistic topic models and pretrained transformers.
Our modular method can be straightforwardly applied with any neural topic model to improve topic quality.
arXiv Detail & Related papers (2020-10-05T22:49:16Z) - Explainable Matrix -- Visualization for Global and Local
Interpretability of Random Forest Classification Ensembles [78.6363825307044]
We propose Explainable Matrix (ExMatrix), a novel visualization method for Random Forest (RF) interpretability.
It employs a simple yet powerful matrix-like visual metaphor, where rows are rules, columns are features, and cells are rules predicates.
ExMatrix applicability is confirmed via different examples, showing how it can be used in practice to promote RF models interpretability.
arXiv Detail & Related papers (2020-05-08T21:03:48Z) - Keyword Assisted Topic Models [0.0]
We show that providing a small number of keywords can substantially enhance the measurement performance of topic models.
KeyATM provides more interpretable results, has better document classification performance, and is less sensitive to the number of topics than the standard topic models.
arXiv Detail & Related papers (2020-04-13T14:35:28Z) - How Far are We from Effective Context Modeling? An Exploratory Study on
Semantic Parsing in Context [59.13515950353125]
We present a grammar-based decoding semantic parsing and adapt typical context modeling methods on top of it.
We evaluate 13 context modeling methods on two large cross-domain datasets, and our best model achieves state-of-the-art performances.
arXiv Detail & Related papers (2020-02-03T11:28:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.