Octet: Online Catalog Taxonomy Enrichment with Self-Supervision
- URL: http://arxiv.org/abs/2006.10276v1
- Date: Thu, 18 Jun 2020 04:53:07 GMT
- Title: Octet: Online Catalog Taxonomy Enrichment with Self-Supervision
- Authors: Yuning Mao, Tong Zhao, Andrey Kan, Chenwei Zhang, Xin Luna Dong,
Christos Faloutsos, Jiawei Han
- Abstract summary: We present a self-supervised end-to-end framework, Octet for Online Catalog EnrichmenT.
We propose to train a sequence labeling model for term extraction and employ graph neural networks (GNNs) to capture the taxonomy structure.
Octet enriches an online catalog in production to 2 times larger in the open-world evaluation.
- Score: 67.26804972901952
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Taxonomies have found wide applications in various domains, especially online
for item categorization, browsing, and search. Despite the prevalent use of
online catalog taxonomies, most of them in practice are maintained by humans,
which is labor-intensive and difficult to scale. While taxonomy construction
from scratch is considerably studied in the literature, how to effectively
enrich existing incomplete taxonomies remains an open yet important research
question. Taxonomy enrichment not only requires the robustness to deal with
emerging terms but also the consistency between existing taxonomy structure and
new term attachment. In this paper, we present a self-supervised end-to-end
framework, Octet, for Online Catalog Taxonomy EnrichmenT. Octet leverages
heterogeneous information unique to online catalog taxonomies such as user
queries, items, and their relations to the taxonomy nodes while requiring no
other supervision than the existing taxonomies. We propose to distantly train a
sequence labeling model for term extraction and employ graph neural networks
(GNNs) to capture the taxonomy structure as well as the query-item-taxonomy
interactions for term attachment. Extensive experiments in different online
domains demonstrate the superiority of Octet over state-of-the-art methods via
both automatic and human evaluations. Notably, Octet enriches an online catalog
taxonomy in production to 2 times larger in the open-world evaluation.
Related papers
- Creating a Fine Grained Entity Type Taxonomy Using LLMs [0.0]
This study investigates the potential of GPT-4 and its advanced iteration, GPT-4 Turbo, in autonomously developing a detailed entity type taxonomy.
Our objective is to construct a comprehensive taxonomy, starting from a broad classification of entity types.
This classification is then progressively refined through iterative prompting techniques, leveraging GPT-4's internal knowledge base.
arXiv Detail & Related papers (2024-02-19T21:32:19Z) - Chain-of-Layer: Iteratively Prompting Large Language Models for Taxonomy Induction from Limited Examples [34.88498567698853]
Chain-of-Layer is an incontext learning framework designed to induct from a given set of entities.
We show that Chain-of-Layer achieves state-of-the-art performance on four real-world benchmarks.
arXiv Detail & Related papers (2024-02-12T03:05:54Z) - TaxoEnrich: Self-Supervised Taxonomy Completion via Structure-Semantic
Representations [28.65753036636082]
We propose a new taxonomy completion framework, which effectively leverages both semantic features and structural information in the existing taxonomy.
TaxoEnrich consists of four components: (1) taxonomy-contextualized embedding which incorporates both semantic meanings of concept and taxonomic relations based on powerful pretrained language models; (2) a taxonomy-aware sequential encoder which learns candidate position representations by encoding the structural information of taxonomy.
Experiments on four large real-world datasets from different domains show that TaxoEnrich achieves the best performance among all evaluation metrics and outperforms previous state-of-the-art by a large margin.
arXiv Detail & Related papers (2022-02-10T08:10:43Z) - TaxoCom: Topic Taxonomy Completion with Hierarchical Discovery of Novel
Topic Clusters [57.59286394188025]
We propose a novel framework for topic taxonomy completion, named TaxoCom.
TaxoCom discovers novel sub-topic clusters of terms and documents.
Our comprehensive experiments on two real-world datasets demonstrate that TaxoCom not only generates the high-quality topic taxonomy in terms of term coherency and topic coverage.
arXiv Detail & Related papers (2022-01-18T07:07:38Z) - Who Should Go First? A Self-Supervised Concept Sorting Model for
Improving Taxonomy Expansion [50.794640012673064]
As data and business scope grow in real applications, existing need to be expanded to incorporate new concepts.
Previous works on taxonomy expansion process the new concepts independently and simultaneously, ignoring the potential relationships among them and the appropriate order of inserting operations.
We propose TaxoOrder, a novel self-supervised framework that simultaneously discovers the local hypernym-hyponym structure among new concepts and decides the order of insertion.
arXiv Detail & Related papers (2021-04-08T11:00:43Z) - Studying Taxonomy Enrichment on Diachronic WordNet Versions [70.27072729280528]
We explore the possibilities of taxonomy extension in a resource-poor setting and present methods which are applicable to a large number of languages.
We create novel English and Russian datasets for training and evaluating taxonomy enrichment models and describe a technique of creating such datasets for other languages.
arXiv Detail & Related papers (2020-11-23T16:49:37Z) - STEAM: Self-Supervised Taxonomy Expansion with Mini-Paths [53.45704816829921]
We propose a self-supervised taxonomy expansion model named STEAM.
STEAM generates natural self-supervision signals, and formulates a node attachment prediction task.
Experiments show STEAM outperforms state-of-the-art methods for taxonomy expansion by 11.6% in accuracy and 7.0% in mean reciprocal rank.
arXiv Detail & Related papers (2020-06-18T00:32:53Z) - TaxoExpan: Self-supervised Taxonomy Expansion with Position-Enhanced
Graph Neural Network [62.12557274257303]
Taxonomies consist of machine-interpretable semantics and provide valuable knowledge for many web applications.
We propose a novel self-supervised framework, named TaxoExpan, which automatically generates a set of query concept, anchor concept> pairs from the existing taxonomy as training data.
We develop two innovative techniques in TaxoExpan: (1) a position-enhanced graph neural network that encodes the local structure of an anchor concept in the existing taxonomy, and (2) a noise-robust training objective that enables the learned model to be insensitive to the label noise in the self-supervision data.
arXiv Detail & Related papers (2020-01-26T21:30:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.