Analysis and tuning of hierarchical topic models based on Renyi entropy
approach
- URL: http://arxiv.org/abs/2101.07598v1
- Date: Tue, 19 Jan 2021 12:54:47 GMT
- Title: Analysis and tuning of hierarchical topic models based on Renyi entropy
approach
- Authors: Sergei Koltcov, Vera Ignatenko, Maxim Terpilovskii, Paolo Rosso
- Abstract summary: tuning of parameters of hierarchical models, including the number of topics on each hierarchical level, remains a challenging task.
In this paper, we propose a Renyi entropy-based approach for a partial solution to the above problem.
- Score: 5.487882744996213
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hierarchical topic modeling is a potentially powerful instrument for
determining the topical structure of text collections that allows constructing
a topical hierarchy representing levels of topical abstraction. However, tuning
of parameters of hierarchical models, including the number of topics on each
hierarchical level, remains a challenging task and an open issue. In this
paper, we propose a Renyi entropy-based approach for a partial solution to the
above problem. First, we propose a Renyi entropy-based metric of quality for
hierarchical models. Second, we propose a practical concept of hierarchical
topic model tuning tested on datasets with human mark-up. In the numerical
experiments, we consider three different hierarchical models, namely,
hierarchical latent Dirichlet allocation (hLDA) model, hierarchical Pachinko
allocation model (hPAM), and hierarchical additive regularization of topic
models (hARTM). We demonstrate that hLDA model possesses a significant level of
instability and, moreover, the derived numbers of topics are far away from the
true numbers for labeled datasets. For hPAM model, the Renyi entropy approach
allows us to determine only one level of the data structure. For hARTM model,
the proposed approach allows us to estimate the number of topics for two
hierarchical levels.
Related papers
- Latent Semantic Consensus For Deterministic Geometric Model Fitting [109.44565542031384]
We propose an effective method called Latent Semantic Consensus (LSC)
LSC formulates the model fitting problem into two latent semantic spaces based on data points and model hypotheses.
LSC is able to provide consistent and reliable solutions within only a few milliseconds for general multi-structural model fitting.
arXiv Detail & Related papers (2024-03-11T05:35:38Z) - The Geometric Structure of Topic Models [0.0]
Despite their widespread use in research and application, an in-depth analysis of topic models is still an open research topic.
We propose an incidence-geometric method for deriving an ordinal structure from flat topic models.
We present a new visualization paradigm for concept hierarchies based on ordinal motifs.
arXiv Detail & Related papers (2024-03-06T10:53:51Z) - Learning Hierarchical Features with Joint Latent Space Energy-Based
Prior [44.4434704520236]
We study the fundamental problem of multi-layer generator models in learning hierarchical representations.
We propose a joint latent space EBM prior model with multi-layer latent variables for effective hierarchical representation learning.
arXiv Detail & Related papers (2023-10-14T15:44:14Z) - HyHTM: Hyperbolic Geometry based Hierarchical Topic Models [9.583526547108349]
Hierarchical Topic Models (HTMs) are useful for discovering topic hierarchies in a collection of documents.
We present HyHTM - a Hyperbolic geometry based Hierarchical Topic Models.
arXiv Detail & Related papers (2023-05-16T08:06:11Z) - Knowledge-Aware Bayesian Deep Topic Model [50.58975785318575]
We propose a Bayesian generative model for incorporating prior domain knowledge into hierarchical topic modeling.
Our proposed model efficiently integrates the prior knowledge and improves both hierarchical topic discovery and document representation.
arXiv Detail & Related papers (2022-09-20T09:16:05Z) - Hierarchical Variational Memory for Few-shot Learning Across Domains [120.87679627651153]
We introduce a hierarchical prototype model, where each level of the prototype fetches corresponding information from the hierarchical memory.
The model is endowed with the ability to flexibly rely on features at different semantic levels if the domain shift circumstances so demand.
We conduct thorough ablation studies to demonstrate the effectiveness of each component in our model.
arXiv Detail & Related papers (2021-12-15T15:01:29Z) - Multi-Scale Semantics-Guided Neural Networks for Efficient
Skeleton-Based Human Action Recognition [140.18376685167857]
A simple yet effective multi-scale semantics-guided neural network is proposed for skeleton-based action recognition.
MS-SGN achieves the state-of-the-art performance on the NTU60, NTU120, and SYSU datasets.
arXiv Detail & Related papers (2021-11-07T03:50:50Z) - CoPHE: A Count-Preserving Hierarchical Evaluation Metric in Large-Scale
Multi-Label Text Classification [70.554573538777]
We argue for hierarchical evaluation of the predictions of neural LMTC models.
We describe a structural issue in the representation of the structured label space in prior art.
We propose a set of metrics for hierarchical evaluation using the depth-based representation.
arXiv Detail & Related papers (2021-09-10T13:09:12Z) - Learning deep autoregressive models for hierarchical data [0.6445605125467573]
We propose a model for hierarchical structured data as an extension to the temporal convolutional network (STCN)
We evaluate the proposed model on two different types of sequential data: speech and handwritten text.
arXiv Detail & Related papers (2021-04-28T15:58:45Z) - Hierarchical Representation via Message Propagation for Robust Model
Fitting [28.03005930782681]
We propose a novel hierarchical representation via message propagation (HRMP) method for robust model fitting.
We formulate the consensus information and the preference information as a hierarchical representation to alleviate the sensitivity to gross outliers.
The proposed HRMP can not only accurately estimate the number and parameters of multiple model instances, but also handle multi-structural data contaminated with a large number of outliers.
arXiv Detail & Related papers (2020-12-29T04:14:19Z) - Deep Autoencoding Topic Model with Scalable Hybrid Bayesian Inference [55.35176938713946]
We develop deep autoencoding topic model (DATM) that uses a hierarchy of gamma distributions to construct its multi-stochastic-layer generative network.
We propose a Weibull upward-downward variational encoder that deterministically propagates information upward via a deep neural network, followed by a downward generative model.
The efficacy and scalability of our models are demonstrated on both unsupervised and supervised learning tasks on big corpora.
arXiv Detail & Related papers (2020-06-15T22:22:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.