Who Should Go First? A Self-Supervised Concept Sorting Model for
Improving Taxonomy Expansion
- URL: http://arxiv.org/abs/2104.03682v2
- Date: Sat, 10 Apr 2021 08:14:51 GMT
- Title: Who Should Go First? A Self-Supervised Concept Sorting Model for
Improving Taxonomy Expansion
- Authors: Xiangchen Song, Jiaming Shen, Jieyu Zhang, and Jiawei Han
- Abstract summary: As data and business scope grow in real applications, existing need to be expanded to incorporate new concepts.
Previous works on taxonomy expansion process the new concepts independently and simultaneously, ignoring the potential relationships among them and the appropriate order of inserting operations.
We propose TaxoOrder, a novel self-supervised framework that simultaneously discovers the local hypernym-hyponym structure among new concepts and decides the order of insertion.
- Score: 50.794640012673064
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Taxonomies have been widely used in various machine learning and text mining
systems to organize knowledge and facilitate downstream tasks. One critical
challenge is that, as data and business scope grow in real applications,
existing taxonomies need to be expanded to incorporate new concepts. Previous
works on taxonomy expansion process the new concepts independently and
simultaneously, ignoring the potential relationships among them and the
appropriate order of inserting operations. However, in reality, the new
concepts tend to be mutually correlated and form local hypernym-hyponym
structures. In such a scenario, ignoring the dependencies of new concepts and
the order of insertion may trigger error propagation. For example, existing
taxonomy expansion systems may insert hyponyms to existing taxonomies before
their hypernym, leading to sub-optimal expanded taxonomies. To complement
existing taxonomy expansion systems, we propose TaxoOrder, a novel
self-supervised framework that simultaneously discovers the local
hypernym-hyponym structure among new concepts and decides the order of
insertion. TaxoOrder can be directly plugged into any taxonomy expansion system
and improve the quality of expanded taxonomies. Experiments on the real-world
dataset validate the effectiveness of TaxoOrder to enhance taxonomy expansion
systems, leading to better-resulting taxonomies with comparison to baselines
under various evaluation metrics.
Related papers
- CodeTaxo: Enhancing Taxonomy Expansion with Limited Examples via Code Language Prompts [40.52605902842168]
textscCodeTaxo is a novel approach that leverages large language models through code language prompts to capture the taxonomic structure.
Experiments on five real-world benchmarks from different domains demonstrate that textscCodeTaxo consistently achieves superior performance across all evaluation metrics.
arXiv Detail & Related papers (2024-08-17T02:15:07Z) - Insert or Attach: Taxonomy Completion via Box Embedding [75.69894194912595]
Previous approaches embed concepts as vectors in Euclidean space, which makes it difficult to model asymmetric relations in taxonomy.
We develop a framework, TaxBox, that leverages box containment and center closeness to design two specialized geometric scorers within the box embedding space.
These scorers are tailored for insertion and attachment operations and can effectively capture intrinsic relationships between concepts.
arXiv Detail & Related papers (2023-05-18T14:34:58Z) - Learning What You Need from What You Did: Product Taxonomy Expansion
with User Behaviors Supervision [21.649258076884927]
We present a self-supervised and user behavior-oriented product expansion framework to append new concepts into existing taxonomy.
Our framework extracts hyponymy relations that conform to users' intentions and cognition.
Our method enlarges the size of real-world product from 39,263 to 94,698 relations with 88% semantic precision.
arXiv Detail & Related papers (2022-03-28T17:17:50Z) - TaxoEnrich: Self-Supervised Taxonomy Completion via Structure-Semantic
Representations [28.65753036636082]
We propose a new taxonomy completion framework, which effectively leverages both semantic features and structural information in the existing taxonomy.
TaxoEnrich consists of four components: (1) taxonomy-contextualized embedding which incorporates both semantic meanings of concept and taxonomic relations based on powerful pretrained language models; (2) a taxonomy-aware sequential encoder which learns candidate position representations by encoding the structural information of taxonomy.
Experiments on four large real-world datasets from different domains show that TaxoEnrich achieves the best performance among all evaluation metrics and outperforms previous state-of-the-art by a large margin.
arXiv Detail & Related papers (2022-02-10T08:10:43Z) - TaxoCom: Topic Taxonomy Completion with Hierarchical Discovery of Novel
Topic Clusters [57.59286394188025]
We propose a novel framework for topic taxonomy completion, named TaxoCom.
TaxoCom discovers novel sub-topic clusters of terms and documents.
Our comprehensive experiments on two real-world datasets demonstrate that TaxoCom not only generates the high-quality topic taxonomy in terms of term coherency and topic coverage.
arXiv Detail & Related papers (2022-01-18T07:07:38Z) - Octet: Online Catalog Taxonomy Enrichment with Self-Supervision [67.26804972901952]
We present a self-supervised end-to-end framework, Octet for Online Catalog EnrichmenT.
We propose to train a sequence labeling model for term extraction and employ graph neural networks (GNNs) to capture the taxonomy structure.
Octet enriches an online catalog in production to 2 times larger in the open-world evaluation.
arXiv Detail & Related papers (2020-06-18T04:53:07Z) - STEAM: Self-Supervised Taxonomy Expansion with Mini-Paths [53.45704816829921]
We propose a self-supervised taxonomy expansion model named STEAM.
STEAM generates natural self-supervision signals, and formulates a node attachment prediction task.
Experiments show STEAM outperforms state-of-the-art methods for taxonomy expansion by 11.6% in accuracy and 7.0% in mean reciprocal rank.
arXiv Detail & Related papers (2020-06-18T00:32:53Z) - TaxoExpan: Self-supervised Taxonomy Expansion with Position-Enhanced
Graph Neural Network [62.12557274257303]
Taxonomies consist of machine-interpretable semantics and provide valuable knowledge for many web applications.
We propose a novel self-supervised framework, named TaxoExpan, which automatically generates a set of query concept, anchor concept> pairs from the existing taxonomy as training data.
We develop two innovative techniques in TaxoExpan: (1) a position-enhanced graph neural network that encodes the local structure of an anchor concept in the existing taxonomy, and (2) a noise-robust training objective that enables the learned model to be insensitive to the label noise in the self-supervision data.
arXiv Detail & Related papers (2020-01-26T21:30:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.