Topic Taxonomy Expansion via Hierarchy-Aware Topic Phrase Generation
- URL: http://arxiv.org/abs/2211.01981v1
- Date: Tue, 18 Oct 2022 22:38:49 GMT
- Title: Topic Taxonomy Expansion via Hierarchy-Aware Topic Phrase Generation
- Authors: Dongha Lee, Jiaming Shen, Seonghyeon Lee, Susik Yoon, Hwanjo Yu,
Jiawei Han
- Abstract summary: We propose a novel framework for topic taxonomy expansion, named TopicExpan.
TopicExpan directly generates topic-related terms belonging to new topics.
Experimental results on two real-world text corpora show that TopicExpan significantly outperforms other baseline methods in terms of the quality of output.
- Score: 58.3921103230647
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Topic taxonomies display hierarchical topic structures of a text corpus and
provide topical knowledge to enhance various NLP applications. To dynamically
incorporate new topic information, several recent studies have tried to expand
(or complete) a topic taxonomy by inserting emerging topics identified in a set
of new documents. However, existing methods focus only on frequent terms in
documents and the local topic-subtopic relations in a taxonomy, which leads to
limited topic term coverage and fails to model the global topic hierarchy. In
this work, we propose a novel framework for topic taxonomy expansion, named
TopicExpan, which directly generates topic-related terms belonging to new
topics. Specifically, TopicExpan leverages the hierarchical relation structure
surrounding a new topic and the textual content of an input document for topic
term generation. This approach encourages newly-inserted topics to further
cover important but less frequent terms as well as to keep their relation
consistency within the taxonomy. Experimental results on two real-world text
corpora show that TopicExpan significantly outperforms other baseline methods
in terms of the quality of output taxonomies.
Related papers
- Improving Retrieval in Theme-specific Applications using a Corpus
Topical Taxonomy [52.426623750562335]
We introduce ToTER (Topical taxonomy Enhanced Retrieval) framework.
ToTER identifies the central topics of queries and documents with the guidance of the taxonomy, and exploits their topical relatedness to supplement missing contexts.
As a plug-and-play framework, ToTER can be flexibly employed to enhance various PLM-based retrievers.
arXiv Detail & Related papers (2024-03-07T02:34:54Z) - Effective Neural Topic Modeling with Embedding Clustering Regularization [21.692088899479934]
We propose a new neural topic model, Embedding Clustering Regularization Topic Model (ECRTM)
ECRTM forces each topic embedding to be the center of a separately aggregated word embedding cluster in the semantic space.
Our ECRTM generates diverse and coherent topics together with high-quality topic distributions of documents.
arXiv Detail & Related papers (2023-06-07T07:45:38Z) - HyperMiner: Topic Taxonomy Mining with Hyperbolic Embedding [54.52651110749165]
We present a novel framework that introduces hyperbolic embeddings to represent words and topics.
With the tree-likeness property of hyperbolic space, the underlying semantic hierarchy can be better exploited to mine more interpretable topics.
arXiv Detail & Related papers (2022-10-16T02:54:17Z) - Knowledge-Aware Bayesian Deep Topic Model [50.58975785318575]
We propose a Bayesian generative model for incorporating prior domain knowledge into hierarchical topic modeling.
Our proposed model efficiently integrates the prior knowledge and improves both hierarchical topic discovery and document representation.
arXiv Detail & Related papers (2022-09-20T09:16:05Z) - TaxoCom: Topic Taxonomy Completion with Hierarchical Discovery of Novel
Topic Clusters [57.59286394188025]
We propose a novel framework for topic taxonomy completion, named TaxoCom.
TaxoCom discovers novel sub-topic clusters of terms and documents.
Our comprehensive experiments on two real-world datasets demonstrate that TaxoCom not only generates the high-quality topic taxonomy in terms of term coherency and topic coverage.
arXiv Detail & Related papers (2022-01-18T07:07:38Z) - TopicNet: Semantic Graph-Guided Topic Discovery [51.71374479354178]
Existing deep hierarchical topic models are able to extract semantically meaningful topics from a text corpus in an unsupervised manner.
We introduce TopicNet as a deep hierarchical topic model that can inject prior structural knowledge as an inductive bias to influence learning.
arXiv Detail & Related papers (2021-10-27T09:07:14Z) - Hierarchical Topic Mining via Joint Spherical Tree and Text Embedding [37.7780399311715]
Hierarchical Topic Mining aims to mine a set of representative terms for each category from a text corpus to help a user comprehend his/her interested topics.
Our model, named JoSH, mines a high-quality set of hierarchical topics with high efficiency and benefits weakly-supervised hierarchical text classification tasks.
arXiv Detail & Related papers (2020-07-18T23:30:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.