A Unified Taxonomy-Guided Instruction Tuning Framework for Entity Set Expansion and Taxonomy Expansion
- URL: http://arxiv.org/abs/2402.13405v4
- Date: Thu, 15 Aug 2024 00:24:03 GMT
- Title: A Unified Taxonomy-Guided Instruction Tuning Framework for Entity Set Expansion and Taxonomy Expansion
- Authors: Yanzhen Shen, Yu Zhang, Yunyi Zhang, Jiawei Han,
- Abstract summary: We identify two common skills needed for entity set expansion, taxonomy expansion, and seed-guided taxonomy construction: finding "siblings" and finding "parents"
We propose a taxonomy-guided instruction tuning framework to teach a large language model to generate siblings and parents for query entities.
- Score: 32.700682888513946
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Entity set expansion, taxonomy expansion, and seed-guided taxonomy construction are three representative tasks that can be applied to automatically populate an existing taxonomy with emerging concepts. Previous studies view them as three separate tasks. Therefore, their proposed techniques usually work for one specific task only, lacking generalizability and a holistic perspective. In this paper, we aim at a unified solution to the three tasks. To be specific, we identify two common skills needed for entity set expansion, taxonomy expansion, and seed-guided taxonomy construction: finding "siblings" and finding "parents". We propose a taxonomy-guided instruction tuning framework to teach a large language model to generate siblings and parents for query entities, where the joint pre-training process facilitates the mutual enhancement of the two skills. Extensive experiments on multiple benchmark datasets demonstrate the efficacy of our proposed TaxoInstruct framework, which outperforms task-specific baselines across all three tasks.
Related papers
- CodeTaxo: Enhancing Taxonomy Expansion with Limited Examples via Code Language Prompts [40.52605902842168]
textscCodeTaxo is a novel approach that leverages large language models through code language prompts to capture the taxonomic structure.
Experiments on five real-world benchmarks from different domains demonstrate that textscCodeTaxo consistently achieves superior performance across all evaluation metrics.
arXiv Detail & Related papers (2024-08-17T02:15:07Z) - Improving Retrieval in Theme-specific Applications using a Corpus
Topical Taxonomy [52.426623750562335]
We introduce ToTER (Topical taxonomy Enhanced Retrieval) framework.
ToTER identifies the central topics of queries and documents with the guidance of the taxonomy, and exploits their topical relatedness to supplement missing contexts.
As a plug-and-play framework, ToTER can be flexibly employed to enhance various PLM-based retrievers.
arXiv Detail & Related papers (2024-03-07T02:34:54Z) - Creating a Fine Grained Entity Type Taxonomy Using LLMs [0.0]
This study investigates the potential of GPT-4 and its advanced iteration, GPT-4 Turbo, in autonomously developing a detailed entity type taxonomy.
Our objective is to construct a comprehensive taxonomy, starting from a broad classification of entity types.
This classification is then progressively refined through iterative prompting techniques, leveraging GPT-4's internal knowledge base.
arXiv Detail & Related papers (2024-02-19T21:32:19Z) - Towards Visual Taxonomy Expansion [50.462998483087915]
We propose Visual Taxonomy Expansion (VTE), introducing visual features into the taxonomy expansion task.
We propose a textual hypernymy learning task and a visual prototype learning task to cluster textual and visual semantics.
Our method is evaluated on two datasets, where we obtain compelling results.
arXiv Detail & Related papers (2023-09-12T10:17:28Z) - TaxoEnrich: Self-Supervised Taxonomy Completion via Structure-Semantic
Representations [28.65753036636082]
We propose a new taxonomy completion framework, which effectively leverages both semantic features and structural information in the existing taxonomy.
TaxoEnrich consists of four components: (1) taxonomy-contextualized embedding which incorporates both semantic meanings of concept and taxonomic relations based on powerful pretrained language models; (2) a taxonomy-aware sequential encoder which learns candidate position representations by encoding the structural information of taxonomy.
Experiments on four large real-world datasets from different domains show that TaxoEnrich achieves the best performance among all evaluation metrics and outperforms previous state-of-the-art by a large margin.
arXiv Detail & Related papers (2022-02-10T08:10:43Z) - TaxoCom: Topic Taxonomy Completion with Hierarchical Discovery of Novel
Topic Clusters [57.59286394188025]
We propose a novel framework for topic taxonomy completion, named TaxoCom.
TaxoCom discovers novel sub-topic clusters of terms and documents.
Our comprehensive experiments on two real-world datasets demonstrate that TaxoCom not only generates the high-quality topic taxonomy in terms of term coherency and topic coverage.
arXiv Detail & Related papers (2022-01-18T07:07:38Z) - Who Should Go First? A Self-Supervised Concept Sorting Model for
Improving Taxonomy Expansion [50.794640012673064]
As data and business scope grow in real applications, existing need to be expanded to incorporate new concepts.
Previous works on taxonomy expansion process the new concepts independently and simultaneously, ignoring the potential relationships among them and the appropriate order of inserting operations.
We propose TaxoOrder, a novel self-supervised framework that simultaneously discovers the local hypernym-hyponym structure among new concepts and decides the order of insertion.
arXiv Detail & Related papers (2021-04-08T11:00:43Z) - Can Taxonomy Help? Improving Semantic Question Matching using Question
Taxonomy [37.57300969050908]
We propose a hybrid technique for semantic question matching.
It uses our proposed two-layered taxonomy for English questions by augmenting state-of-the-art deep learning models with question classes obtained from a deep learning based question.
arXiv Detail & Related papers (2021-01-20T16:23:04Z) - CoRel: Seed-Guided Topical Taxonomy Construction by Concept Learning and
Relation Transferring [37.1330815281983]
We propose a method for seed-guided topical taxonomy construction, which takes a corpus and a seed taxonomy described by concept names as input.
A relation transferring module learns and transfers the user's interested relation along multiple paths to expand the seed taxonomy structure in width and depth.
A concept learning module enriches the semantics of each concept node by jointly embedding the taxonomy.
arXiv Detail & Related papers (2020-10-13T22:00:31Z) - STEAM: Self-Supervised Taxonomy Expansion with Mini-Paths [53.45704816829921]
We propose a self-supervised taxonomy expansion model named STEAM.
STEAM generates natural self-supervision signals, and formulates a node attachment prediction task.
Experiments show STEAM outperforms state-of-the-art methods for taxonomy expansion by 11.6% in accuracy and 7.0% in mean reciprocal rank.
arXiv Detail & Related papers (2020-06-18T00:32:53Z) - TaxoExpan: Self-supervised Taxonomy Expansion with Position-Enhanced
Graph Neural Network [62.12557274257303]
Taxonomies consist of machine-interpretable semantics and provide valuable knowledge for many web applications.
We propose a novel self-supervised framework, named TaxoExpan, which automatically generates a set of query concept, anchor concept> pairs from the existing taxonomy as training data.
We develop two innovative techniques in TaxoExpan: (1) a position-enhanced graph neural network that encodes the local structure of an anchor concept in the existing taxonomy, and (2) a noise-robust training objective that enables the learned model to be insensitive to the label noise in the self-supervision data.
arXiv Detail & Related papers (2020-01-26T21:30:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.