PIVOINE: Instruction Tuning for Open-world Information Extraction
- URL: http://arxiv.org/abs/2305.14898v1
- Date: Wed, 24 May 2023 08:52:08 GMT
- Title: PIVOINE: Instruction Tuning for Open-world Information Extraction
- Authors: Keming Lu, Xiaoman Pan, Kaiqiang Song, Hongming Zhang, Dong Yu,
Jianshu Chen
- Abstract summary: We consider the problem of Open-world Information Extraction (Open-world IE), which extracts comprehensive entity profiles from unstructured texts.
We develop a large language model (LLM) that is able to perform Open-world IE to extract desirable entity profiles characterized by (possibly fine-grained) natural language instructions.
In particular, we construct INSTRUCTOPENWIKI, a substantial instruction tuning dataset for Open-world IE enriched with a comprehensive corpus, extensive annotations, and diverse instructions.
- Score: 53.98073623222221
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of Open-world Information Extraction (Open-world IE),
which extracts comprehensive entity profiles from unstructured texts. Different
from the conventional closed-world setting of Information Extraction (IE),
Open-world IE considers a more general situation where entities and relations
could be beyond a predefined ontology. More importantly, we seek to develop a
large language model (LLM) that is able to perform Open-world IE to extract
desirable entity profiles characterized by (possibly fine-grained) natural
language instructions. We achieve this by finetuning LLMs using instruction
tuning. In particular, we construct INSTRUCTOPENWIKI, a substantial instruction
tuning dataset for Open-world IE enriched with a comprehensive corpus,
extensive annotations, and diverse instructions. We finetune the pretrained
BLOOM models on INSTRUCTOPENWIKI and obtain PIVOINE, an LLM for Open-world IE
with strong instruction-following capabilities. Our experiments demonstrate
that PIVOINE significantly outperforms traditional closed-world methods and
other LLM baselines, displaying impressive generalization capabilities on both
unseen instructions and out-of-ontology cases. Consequently, PIVOINE emerges as
a promising solution to tackle the open-world challenge in IE effectively.
Related papers
- Assessing the Performance of Chinese Open Source Large Language Models in Information Extraction Tasks [12.400599440431188]
Information Extraction (IE) plays a crucial role in Natural Language Processing (NLP)
Recent experiments focusing on English IE tasks have shed light on the challenges faced by Large Language Models (LLMs) in achieving optimal performance.
arXiv Detail & Related papers (2024-06-04T08:00:40Z) - ADELIE: Aligning Large Language Models on Information Extraction [55.60192044049083]
Large language models (LLMs) usually fall short on information extraction tasks.
In this paper, we introduce ADELIE, an aligned LLM that effectively solves various IE tasks.
We show that our models achieve state-of-the-art (SoTA) performance among open-source models.
arXiv Detail & Related papers (2024-05-08T12:24:52Z) - Bayesian Preference Elicitation with Language Models [82.58230273253939]
We introduce OPEN, a framework that uses BOED to guide the choice of informative questions and an LM to extract features.
In user studies, we find that OPEN outperforms existing LM- and BOED-based methods for preference elicitation.
arXiv Detail & Related papers (2024-03-08T18:57:52Z) - IEPile: Unearthing Large-Scale Schema-Based Information Extraction Corpus [38.27122981449957]
IEPile is a comprehensive bilingual (English and Chinese) IE instruction corpus, which contains approximately 0.32B tokens.
We construct IEPile by collecting and cleaning 33 existing IE datasets, and introduce schema-based instruction generation to unearth a large-scale corpus.
Experimentally, IEPile enhance the performance of LLMs for IE, with notable improvements in zero-shot generalization.
arXiv Detail & Related papers (2024-02-22T17:11:38Z) - Instruct and Extract: Instruction Tuning for On-Demand Information
Extraction [86.29491354355356]
On-Demand Information Extraction aims to fulfill the personalized demands of real-world users.
We present a benchmark named InstructIE, inclusive of both automatically generated training data, as well as the human-annotated test set.
Building on InstructIE, we further develop an On-Demand Information Extractor, ODIE.
arXiv Detail & Related papers (2023-10-24T17:54:25Z) - LasUIE: Unifying Information Extraction with Latent Adaptive
Structure-aware Generative Language Model [96.889634747943]
Universally modeling all typical information extraction tasks (UIE) with one generative language model (GLM) has revealed great potential.
We propose a novel structure-aware GLM, fully unleashing the power of syntactic knowledge for UIE.
Over 12 IE benchmarks across 7 tasks our system shows significant improvements over the baseline UIE system.
arXiv Detail & Related papers (2023-04-13T04:01:14Z) - Unified Structure Generation for Universal Information Extraction [58.89057387608414]
UIE can universally model different IE tasks, adaptively generate targeted structures, and collaboratively learn general IE abilities from different knowledge sources.
Experiments show that UIE achieved the state-of-the-art performance on 4 IE tasks, 13 datasets, and on all supervised, low-resource, and few-shot settings.
arXiv Detail & Related papers (2022-03-23T08:49:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.