Self-supervised Topic Taxonomy Discovery in the Box Embedding Space
- URL: http://arxiv.org/abs/2408.15050v1
- Date: Tue, 27 Aug 2024 13:19:32 GMT
- Title: Self-supervised Topic Taxonomy Discovery in the Box Embedding Space
- Authors: Yuyin Lu, Hegang Chen, Pengbo Mao, Yanghui Rao, Haoran Xie, Fu Lee Wang, Qing Li,
- Abstract summary: This paper develops a Box embedding-based Topic Model (BoxTM) that maps words and topics into the box embedding space.
Our BoxTM explicitly infers upper-level topics based on correlation between specific topics.
Extensive experiments validate high-quality of the topic taxonomy learned by BoxTM.
- Score: 23.942807248774514
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Topic taxonomy discovery aims at uncovering topics of different abstraction levels and constructing hierarchical relations between them. Unfortunately, most of prior work can hardly model semantic scopes of words and topics by holding the Euclidean embedding space assumption. What's worse, they infer asymmetric hierarchical relations by symmetric distances between topic embeddings. As a result, existing methods suffer from problems of low-quality topics at high abstraction levels and inaccurate hierarchical relations. To alleviate these problems, this paper develops a Box embedding-based Topic Model (BoxTM) that maps words and topics into the box embedding space, where the asymmetric metric is defined to properly infer hierarchical relations among topics. Additionally, our BoxTM explicitly infers upper-level topics based on correlation between specific topics through recursive clustering on topic boxes. Finally, extensive experiments validate high-quality of the topic taxonomy learned by BoxTM.
Related papers
- On the Affinity, Rationality, and Diversity of Hierarchical Topic
Modeling [29.277151061615434]
We propose Transport Plan and Context-aware Hierarchical Topic Model (TraCo)
TraCo constrains dependencies to ensure their sparsity and balance, and also regularizes topic hierarchy building with them.
Rather than previously entangled decoding, it distributes different semantic granularity to topics at different levels by disentangled decoding.
arXiv Detail & Related papers (2024-01-25T11:47:58Z) - Effective Neural Topic Modeling with Embedding Clustering Regularization [21.692088899479934]
We propose a new neural topic model, Embedding Clustering Regularization Topic Model (ECRTM)
ECRTM forces each topic embedding to be the center of a separately aggregated word embedding cluster in the semantic space.
Our ECRTM generates diverse and coherent topics together with high-quality topic distributions of documents.
arXiv Detail & Related papers (2023-06-07T07:45:38Z) - HyHTM: Hyperbolic Geometry based Hierarchical Topic Models [9.583526547108349]
Hierarchical Topic Models (HTMs) are useful for discovering topic hierarchies in a collection of documents.
We present HyHTM - a Hyperbolic geometry based Hierarchical Topic Models.
arXiv Detail & Related papers (2023-05-16T08:06:11Z) - Topic Taxonomy Expansion via Hierarchy-Aware Topic Phrase Generation [58.3921103230647]
We propose a novel framework for topic taxonomy expansion, named TopicExpan.
TopicExpan directly generates topic-related terms belonging to new topics.
Experimental results on two real-world text corpora show that TopicExpan significantly outperforms other baseline methods in terms of the quality of output.
arXiv Detail & Related papers (2022-10-18T22:38:49Z) - HyperMiner: Topic Taxonomy Mining with Hyperbolic Embedding [54.52651110749165]
We present a novel framework that introduces hyperbolic embeddings to represent words and topics.
With the tree-likeness property of hyperbolic space, the underlying semantic hierarchy can be better exploited to mine more interpretable topics.
arXiv Detail & Related papers (2022-10-16T02:54:17Z) - Knowledge-Aware Bayesian Deep Topic Model [50.58975785318575]
We propose a Bayesian generative model for incorporating prior domain knowledge into hierarchical topic modeling.
Our proposed model efficiently integrates the prior knowledge and improves both hierarchical topic discovery and document representation.
arXiv Detail & Related papers (2022-09-20T09:16:05Z) - TaxoCom: Topic Taxonomy Completion with Hierarchical Discovery of Novel
Topic Clusters [57.59286394188025]
We propose a novel framework for topic taxonomy completion, named TaxoCom.
TaxoCom discovers novel sub-topic clusters of terms and documents.
Our comprehensive experiments on two real-world datasets demonstrate that TaxoCom not only generates the high-quality topic taxonomy in terms of term coherency and topic coverage.
arXiv Detail & Related papers (2022-01-18T07:07:38Z) - TopicNet: Semantic Graph-Guided Topic Discovery [51.71374479354178]
Existing deep hierarchical topic models are able to extract semantically meaningful topics from a text corpus in an unsupervised manner.
We introduce TopicNet as a deep hierarchical topic model that can inject prior structural knowledge as an inductive bias to influence learning.
arXiv Detail & Related papers (2021-10-27T09:07:14Z) - Sawtooth Factorial Topic Embeddings Guided Gamma Belief Network [49.458250193768826]
We propose sawtooth factorial topic embedding guided GBN, a deep generative model of documents.
Both the words and topics are represented as embedding vectors of the same dimension.
Our models outperform other neural topic models on extracting deeper interpretable topics.
arXiv Detail & Related papers (2021-06-30T10:14:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.