Hierarchical Topic Mining via Joint Spherical Tree and Text Embedding
- URL: http://arxiv.org/abs/2007.09536v1
- Date: Sat, 18 Jul 2020 23:30:47 GMT
- Title: Hierarchical Topic Mining via Joint Spherical Tree and Text Embedding
- Authors: Yu Meng, Yunyi Zhang, Jiaxin Huang, Yu Zhang, Chao Zhang, Jiawei Han
- Abstract summary: Hierarchical Topic Mining aims to mine a set of representative terms for each category from a text corpus to help a user comprehend his/her interested topics.
Our model, named JoSH, mines a high-quality set of hierarchical topics with high efficiency and benefits weakly-supervised hierarchical text classification tasks.
- Score: 37.7780399311715
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mining a set of meaningful topics organized into a hierarchy is intuitively
appealing since topic correlations are ubiquitous in massive text corpora. To
account for potential hierarchical topic structures, hierarchical topic models
generalize flat topic models by incorporating latent topic hierarchies into
their generative modeling process. However, due to their purely unsupervised
nature, the learned topic hierarchy often deviates from users' particular needs
or interests. To guide the hierarchical topic discovery process with minimal
user supervision, we propose a new task, Hierarchical Topic Mining, which takes
a category tree described by category names only, and aims to mine a set of
representative terms for each category from a text corpus to help a user
comprehend his/her interested topics. We develop a novel joint tree and text
embedding method along with a principled optimization procedure that allows
simultaneous modeling of the category tree structure and the corpus generative
process in the spherical space for effective category-representative term
discovery. Our comprehensive experiments show that our model, named JoSH, mines
a high-quality set of hierarchical topics with high efficiency and benefits
weakly-supervised hierarchical text classification tasks.
Related papers
- On the Affinity, Rationality, and Diversity of Hierarchical Topic
Modeling [29.277151061615434]
We propose Transport Plan and Context-aware Hierarchical Topic Model (TraCo)
TraCo constrains dependencies to ensure their sparsity and balance, and also regularizes topic hierarchy building with them.
Rather than previously entangled decoding, it distributes different semantic granularity to topics at different levels by disentangled decoding.
arXiv Detail & Related papers (2024-01-25T11:47:58Z) - Topic Taxonomy Expansion via Hierarchy-Aware Topic Phrase Generation [58.3921103230647]
We propose a novel framework for topic taxonomy expansion, named TopicExpan.
TopicExpan directly generates topic-related terms belonging to new topics.
Experimental results on two real-world text corpora show that TopicExpan significantly outperforms other baseline methods in terms of the quality of output.
arXiv Detail & Related papers (2022-10-18T22:38:49Z) - HyperMiner: Topic Taxonomy Mining with Hyperbolic Embedding [54.52651110749165]
We present a novel framework that introduces hyperbolic embeddings to represent words and topics.
With the tree-likeness property of hyperbolic space, the underlying semantic hierarchy can be better exploited to mine more interpretable topics.
arXiv Detail & Related papers (2022-10-16T02:54:17Z) - Knowledge-Aware Bayesian Deep Topic Model [50.58975785318575]
We propose a Bayesian generative model for incorporating prior domain knowledge into hierarchical topic modeling.
Our proposed model efficiently integrates the prior knowledge and improves both hierarchical topic discovery and document representation.
arXiv Detail & Related papers (2022-09-20T09:16:05Z) - TaxoCom: Topic Taxonomy Completion with Hierarchical Discovery of Novel
Topic Clusters [57.59286394188025]
We propose a novel framework for topic taxonomy completion, named TaxoCom.
TaxoCom discovers novel sub-topic clusters of terms and documents.
Our comprehensive experiments on two real-world datasets demonstrate that TaxoCom not only generates the high-quality topic taxonomy in terms of term coherency and topic coverage.
arXiv Detail & Related papers (2022-01-18T07:07:38Z) - TopicNet: Semantic Graph-Guided Topic Discovery [51.71374479354178]
Existing deep hierarchical topic models are able to extract semantically meaningful topics from a text corpus in an unsupervised manner.
We introduce TopicNet as a deep hierarchical topic model that can inject prior structural knowledge as an inductive bias to influence learning.
arXiv Detail & Related papers (2021-10-27T09:07:14Z) - Generating Categories for Sets of Entities [34.32017697099142]
Category systems are central components of knowledge bases, as they provide a hierarchical grouping of semantically related concepts and entities.
This paper presents a method of generating categories for sets of entities using neural abstractive summarization models.
We develop a test collection based on Wikipedia categories and demonstrate the effectiveness of the proposed approach.
arXiv Detail & Related papers (2020-08-19T13:31:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.