Related papers: Determination of the Number of Topics Intrinsically: Is It Possible?

Determination of the Number of Topics Intrinsically: Is It Possible?

URL: http://arxiv.org/abs/2406.10402v1
Date: Fri, 14 Jun 2024 20:07:46 GMT
Title: Determination of the Number of Topics Intrinsically: Is It Possible?
Authors: Victor Bulatov, Vasiliy Alekseev, Konstantin Vorontsov,
Abstract summary: This study investigates the performance of various methods applied to several topic models on a number of publicly available corpora. The number of topics is shown to be a method- and a model-dependent quantity, as opposed to being an absolute property of a particular corpus.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The number of topics might be the most important parameter of a topic model. The topic modelling community has developed a set of various procedures to estimate the number of topics in a dataset, but there has not yet been a sufficiently complete comparison of existing practices. This study attempts to partially fill this gap by investigating the performance of various methods applied to several topic models on a number of publicly available corpora. Further analysis demonstrates that intrinsic methods are far from being reliable and accurate tools. The number of topics is shown to be a method- and a model-dependent quantity, as opposed to being an absolute property of a particular corpus. We conclude that other methods for dealing with this problem should be developed and suggest some promising directions for further research.

Related papers

Iterative Improvement of an Additively Regularized Topic Model [0.0]
We present a method for iterative training of a topic model. Experiments conducted on several collections of natural language texts show that the proposed ITAR model performs better than other popular topic models.
arXiv Detail & Related papers (2024-08-11T18:22:12Z)
An Iterative Approach to Topic Modelling [0.0]
We propose to use an iterative process to perform topic modelling that gives rise to a sense of completeness of the resulting topics when the process is complete. We demonstrate how the modelling process can be applied iteratively to arrive at a set of topics that could not be further improved upon using one of the three selected measures for clustering comparison.
arXiv Detail & Related papers (2024-07-25T09:26:07Z)
Interactive Topic Models with Optimal Transport [75.26555710661908]
We present EdTM, as an approach for label name supervised topic modeling. EdTM models topic modeling as an assignment problem while leveraging LM/LLM based document-topic affinities.
arXiv Detail & Related papers (2024-06-28T13:57:27Z)
TopicAdapt- An Inter-Corpora Topics Adaptation Approach [27.450275637652418]
This paper proposes a neural topic model, TopicAdapt, that can adapt relevant topics from a related source corpus and also discover new topics in a target corpus that are absent in the source corpus. Experiments over multiple datasets from diverse domains show the superiority of the proposed model against the state-of-the-art topic models.
arXiv Detail & Related papers (2023-10-08T02:56:44Z)
Topics in the Haystack: Extracting and Evaluating Topics beyond Coherence [0.0]
We propose a method that incorporates a deeper understanding of both sentence and document themes. This allows our model to detect latent topics that may include uncommon words or neologisms. We present correlation coefficients with human identification of intruder words and achieve near-human level results at the word-intrusion task.
arXiv Detail & Related papers (2023-03-30T12:24:25Z)
Model-agnostic multi-objective approach for the evolutionary discovery of mathematical models [55.41644538483948]
In modern data science, it is more interesting to understand the properties of the model, which parts could be replaced to obtain better results. We use multi-objective evolutionary optimization for composite data-driven model learning to obtain the algorithm's desired properties.
arXiv Detail & Related papers (2021-07-07T11:17:09Z)
Attentional Prototype Inference for Few-Shot Segmentation [128.45753577331422]
We propose attentional prototype inference (API), a probabilistic latent variable framework for few-shot segmentation. We define a global latent variable to represent the prototype of each object category, which we model as a probabilistic distribution. We conduct extensive experiments on four benchmarks, where our proposal obtains at least competitive and often better performance than state-of-the-art prototype-based methods.
arXiv Detail & Related papers (2021-05-14T06:58:44Z)
Evaluating the Disentanglement of Deep Generative Models through Manifold Topology [66.06153115971732]
We present a method for quantifying disentanglement that only uses the generative model. We empirically evaluate several state-of-the-art models across multiple datasets.
arXiv Detail & Related papers (2020-06-05T20:54:11Z)
Bayesian Sparse Factor Analysis with Kernelized Observations [67.60224656603823]
Multi-view problems can be faced with latent variable models. High-dimensionality and non-linear issues are traditionally handled by kernel methods. We propose merging both approaches into single model.
arXiv Detail & Related papers (2020-06-01T14:25:38Z)
Marginal likelihood computation for model selection and hypothesis testing: an extensive review [66.37504201165159]
This article provides a comprehensive study of the state-of-the-art of the topic. We highlight limitations, benefits, connections and differences among the different techniques. Problems and possible solutions with the use of improper priors are also described.
arXiv Detail & Related papers (2020-05-17T18:31:58Z)
Keyword Assisted Topic Models [0.0]
We show that providing a small number of keywords can substantially enhance the measurement performance of topic models. KeyATM provides more interpretable results, has better document classification performance, and is less sensitive to the number of topics than the standard topic models.
arXiv Detail & Related papers (2020-04-13T14:35:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.