On the Affinity, Rationality, and Diversity of Hierarchical Topic
Modeling
- URL: http://arxiv.org/abs/2401.14113v2
- Date: Thu, 1 Feb 2024 03:47:28 GMT
- Title: On the Affinity, Rationality, and Diversity of Hierarchical Topic
Modeling
- Authors: Xiaobao Wu, Fengjun Pan, Thong Nguyen, Yichao Feng, Chaoqun Liu,
Cong-Duy Nguyen, Anh Tuan Luu
- Abstract summary: We propose Transport Plan and Context-aware Hierarchical Topic Model (TraCo)
TraCo constrains dependencies to ensure their sparsity and balance, and also regularizes topic hierarchy building with them.
Rather than previously entangled decoding, it distributes different semantic granularity to topics at different levels by disentangled decoding.
- Score: 29.277151061615434
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hierarchical topic modeling aims to discover latent topics from a corpus and
organize them into a hierarchy to understand documents with desirable semantic
granularity. However, existing work struggles with producing topic hierarchies
of low affinity, rationality, and diversity, which hampers document
understanding. To overcome these challenges, we in this paper propose Transport
Plan and Context-aware Hierarchical Topic Model (TraCo). Instead of early
simple topic dependencies, we propose a transport plan dependency method. It
constrains dependencies to ensure their sparsity and balance, and also
regularizes topic hierarchy building with them. This improves affinity and
diversity of hierarchies. We further propose a context-aware disentangled
decoder. Rather than previously entangled decoding, it distributes different
semantic granularity to topics at different levels by disentangled decoding.
This facilitates the rationality of hierarchies. Experiments on benchmark
datasets demonstrate that our method surpasses state-of-the-art baselines,
effectively improving the affinity, rationality, and diversity of hierarchical
topic modeling with better performance on downstream tasks.
Related papers
- HDT: Hierarchical Document Transformer [70.2271469410557]
HDT exploits document structure by introducing auxiliary anchor tokens and redesigning the attention mechanism into a sparse multi-level hierarchy.
We develop a novel sparse attention kernel that considers the hierarchical structure of documents.
arXiv Detail & Related papers (2024-07-11T09:28:04Z) - Reinforcement Learning with Options and State Representation [105.82346211739433]
This thesis aims to explore the reinforcement learning field and build on existing methods to produce improved ones.
It addresses such goals by decomposing learning tasks in a hierarchical fashion known as Hierarchical Reinforcement Learning.
arXiv Detail & Related papers (2024-03-16T08:30:55Z) - HyperMiner: Topic Taxonomy Mining with Hyperbolic Embedding [54.52651110749165]
We present a novel framework that introduces hyperbolic embeddings to represent words and topics.
With the tree-likeness property of hyperbolic space, the underlying semantic hierarchy can be better exploited to mine more interpretable topics.
arXiv Detail & Related papers (2022-10-16T02:54:17Z) - Knowledge-Aware Bayesian Deep Topic Model [50.58975785318575]
We propose a Bayesian generative model for incorporating prior domain knowledge into hierarchical topic modeling.
Our proposed model efficiently integrates the prior knowledge and improves both hierarchical topic discovery and document representation.
arXiv Detail & Related papers (2022-09-20T09:16:05Z) - TaxoCom: Topic Taxonomy Completion with Hierarchical Discovery of Novel
Topic Clusters [57.59286394188025]
We propose a novel framework for topic taxonomy completion, named TaxoCom.
TaxoCom discovers novel sub-topic clusters of terms and documents.
Our comprehensive experiments on two real-world datasets demonstrate that TaxoCom not only generates the high-quality topic taxonomy in terms of term coherency and topic coverage.
arXiv Detail & Related papers (2022-01-18T07:07:38Z) - TopicNet: Semantic Graph-Guided Topic Discovery [51.71374479354178]
Existing deep hierarchical topic models are able to extract semantically meaningful topics from a text corpus in an unsupervised manner.
We introduce TopicNet as a deep hierarchical topic model that can inject prior structural knowledge as an inductive bias to influence learning.
arXiv Detail & Related papers (2021-10-27T09:07:14Z) - An End-to-End Document-Level Neural Discourse Parser Exploiting
Multi-Granularity Representations [24.986030179701405]
We exploit robust representations derived from multiple levels of granularity across syntax and semantics.
We incorporate such representations in an end-to-end encoder-decoder neural architecture for more resourceful discourse processing.
arXiv Detail & Related papers (2020-12-21T08:01:04Z) - Hierarchical Topic Mining via Joint Spherical Tree and Text Embedding [37.7780399311715]
Hierarchical Topic Mining aims to mine a set of representative terms for each category from a text corpus to help a user comprehend his/her interested topics.
Our model, named JoSH, mines a high-quality set of hierarchical topics with high efficiency and benefits weakly-supervised hierarchical text classification tasks.
arXiv Detail & Related papers (2020-07-18T23:30:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.