TaxoCom: Topic Taxonomy Completion with Hierarchical Discovery of Novel
Topic Clusters
- URL: http://arxiv.org/abs/2201.06771v2
- Date: Wed, 19 Jan 2022 20:02:10 GMT
- Title: TaxoCom: Topic Taxonomy Completion with Hierarchical Discovery of Novel
Topic Clusters
- Authors: Dongha Lee, Jiaming Shen, SeongKu Kang, Susik Yoon, Jiawei Han, Hwanjo
Yu
- Abstract summary: We propose a novel framework for topic taxonomy completion, named TaxoCom.
TaxoCom discovers novel sub-topic clusters of terms and documents.
Our comprehensive experiments on two real-world datasets demonstrate that TaxoCom not only generates the high-quality topic taxonomy in terms of term coherency and topic coverage.
- Score: 57.59286394188025
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Topic taxonomies, which represent the latent topic (or category) structure of
document collections, provide valuable knowledge of contents in many
applications such as web search and information filtering. Recently, several
unsupervised methods have been developed to automatically construct the topic
taxonomy from a text corpus, but it is challenging to generate the desired
taxonomy without any prior knowledge. In this paper, we study how to leverage
the partial (or incomplete) information about the topic structure as guidance
to find out the complete topic taxonomy. We propose a novel framework for topic
taxonomy completion, named TaxoCom, which recursively expands the topic
taxonomy by discovering novel sub-topic clusters of terms and documents. To
effectively identify novel topics within a hierarchical topic structure,
TaxoCom devises its embedding and clustering techniques to be closely-linked
with each other: (i) locally discriminative embedding optimizes the text
embedding space to be discriminative among known (i.e., given) sub-topics, and
(ii) novelty adaptive clustering assigns terms into either one of the known
sub-topics or novel sub-topics. Our comprehensive experiments on two real-world
datasets demonstrate that TaxoCom not only generates the high-quality topic
taxonomy in terms of term coherency and topic coverage but also outperforms all
other baselines for a downstream task.
Related papers
- Improving Retrieval in Theme-specific Applications using a Corpus
Topical Taxonomy [52.426623750562335]
We introduce ToTER (Topical taxonomy Enhanced Retrieval) framework.
ToTER identifies the central topics of queries and documents with the guidance of the taxonomy, and exploits their topical relatedness to supplement missing contexts.
As a plug-and-play framework, ToTER can be flexibly employed to enhance various PLM-based retrievers.
arXiv Detail & Related papers (2024-03-07T02:34:54Z) - Topic Taxonomy Expansion via Hierarchy-Aware Topic Phrase Generation [58.3921103230647]
We propose a novel framework for topic taxonomy expansion, named TopicExpan.
TopicExpan directly generates topic-related terms belonging to new topics.
Experimental results on two real-world text corpora show that TopicExpan significantly outperforms other baseline methods in terms of the quality of output.
arXiv Detail & Related papers (2022-10-18T22:38:49Z) - HyperMiner: Topic Taxonomy Mining with Hyperbolic Embedding [54.52651110749165]
We present a novel framework that introduces hyperbolic embeddings to represent words and topics.
With the tree-likeness property of hyperbolic space, the underlying semantic hierarchy can be better exploited to mine more interpretable topics.
arXiv Detail & Related papers (2022-10-16T02:54:17Z) - TaxoEnrich: Self-Supervised Taxonomy Completion via Structure-Semantic
Representations [28.65753036636082]
We propose a new taxonomy completion framework, which effectively leverages both semantic features and structural information in the existing taxonomy.
TaxoEnrich consists of four components: (1) taxonomy-contextualized embedding which incorporates both semantic meanings of concept and taxonomic relations based on powerful pretrained language models; (2) a taxonomy-aware sequential encoder which learns candidate position representations by encoding the structural information of taxonomy.
Experiments on four large real-world datasets from different domains show that TaxoEnrich achieves the best performance among all evaluation metrics and outperforms previous state-of-the-art by a large margin.
arXiv Detail & Related papers (2022-02-10T08:10:43Z) - TopicNet: Semantic Graph-Guided Topic Discovery [51.71374479354178]
Existing deep hierarchical topic models are able to extract semantically meaningful topics from a text corpus in an unsupervised manner.
We introduce TopicNet as a deep hierarchical topic model that can inject prior structural knowledge as an inductive bias to influence learning.
arXiv Detail & Related papers (2021-10-27T09:07:14Z) - CoRel: Seed-Guided Topical Taxonomy Construction by Concept Learning and
Relation Transferring [37.1330815281983]
We propose a method for seed-guided topical taxonomy construction, which takes a corpus and a seed taxonomy described by concept names as input.
A relation transferring module learns and transfers the user's interested relation along multiple paths to expand the seed taxonomy structure in width and depth.
A concept learning module enriches the semantics of each concept node by jointly embedding the taxonomy.
arXiv Detail & Related papers (2020-10-13T22:00:31Z) - Hierarchical Topic Mining via Joint Spherical Tree and Text Embedding [37.7780399311715]
Hierarchical Topic Mining aims to mine a set of representative terms for each category from a text corpus to help a user comprehend his/her interested topics.
Our model, named JoSH, mines a high-quality set of hierarchical topics with high efficiency and benefits weakly-supervised hierarchical text classification tasks.
arXiv Detail & Related papers (2020-07-18T23:30:47Z) - Octet: Online Catalog Taxonomy Enrichment with Self-Supervision [67.26804972901952]
We present a self-supervised end-to-end framework, Octet for Online Catalog EnrichmenT.
We propose to train a sequence labeling model for term extraction and employ graph neural networks (GNNs) to capture the taxonomy structure.
Octet enriches an online catalog in production to 2 times larger in the open-world evaluation.
arXiv Detail & Related papers (2020-06-18T04:53:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.