InstructIE: A Bilingual Instruction-based Information Extraction Dataset
- URL: http://arxiv.org/abs/2305.11527v3
- Date: Thu, 18 Apr 2024 16:20:19 GMT
- Title: InstructIE: A Bilingual Instruction-based Information Extraction Dataset
- Authors: Honghao Gui, Shuofei Qiao, Jintian Zhang, Hongbin Ye, Mengshu Sun, Lei Liang, Jeff Z. Pan, Huajun Chen, Ningyu Zhang,
- Abstract summary: Large language models can perform well on general natural language tasks, but their effectiveness is still not optimal for information extraction.
Recent works indicate that the main reason lies in the lack of extensive data on information extraction instructions.
We introduce InstructIE, a bilingual instruction-based information extraction dataset, which covers 12 diverse domains.
- Score: 44.65162892808696
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models can perform well on general natural language tasks, but their effectiveness is still not optimal for information extraction. Recent works indicate that the main reason lies in the lack of extensive data on information extraction instructions. Note that the existing datasets on information extraction instructions not only have limited coverage but also involve high construction costs. To address this issue, we introduce InstructIE, a bilingual instruction-based information extraction dataset, which covers 12 diverse domains. Specifically, we propose KG2Instruction, a framework specifically for the automatic generation of such datasets. Experimental results demonstrate that large language models trained with InstructIE can not only obtain better information extraction capabilities but also enhance zero-shot performance compared with baselines.
Related papers
- YAYI-UIE: A Chat-Enhanced Instruction Tuning Framework for Universal Information Extraction [20.32778991187863]
We propose an end-to-end chat-enhanced instruction tuning framework for universal information extraction (YAYI-UIE)
Specifically, we utilize dialogue data and information extraction data to enhance the information extraction performance jointly.
arXiv Detail & Related papers (2023-12-24T21:33:03Z) - Instruct and Extract: Instruction Tuning for On-Demand Information
Extraction [86.29491354355356]
On-Demand Information Extraction aims to fulfill the personalized demands of real-world users.
We present a benchmark named InstructIE, inclusive of both automatically generated training data, as well as the human-annotated test set.
Building on InstructIE, we further develop an On-Demand Information Extractor, ODIE.
arXiv Detail & Related papers (2023-10-24T17:54:25Z) - Benchmarking Large Language Models with Augmented Instructions for
Fine-grained Information Extraction [46.09887436555637]
This paper introduces a fine-grained IE benchmark dataset tailored for Large Language Models (LLMs)
Through extensive evaluations, we observe that encoder-decoder models, particularly T5 and FLAN-T5, perform well in generalizing to unseen information types.
arXiv Detail & Related papers (2023-10-08T09:41:18Z) - From Base to Conversational: Japanese Instruction Dataset and Tuning
Large Language Models [6.520584613661788]
We construct a Japanese instruction dataset by expanding and filtering existing datasets.
We perform Low-Rank Adaptation (LoRA) tuning on both Japanese and English existing models.
arXiv Detail & Related papers (2023-09-07T00:14:37Z) - Unnatural Instructions: Tuning Language Models with (Almost) No Human
Labor [48.116843121810135]
We introduce Unnatural Instructions: a large dataset of creative and diverse instructions, collected with virtually no human labor.
We collect 64,000 examples by prompting a language model with three seed examples of instructions and eliciting a fourth.
This set is then expanded by prompting the model to rephrase each instruction, creating a total of approximately 240,000 examples of instructions, inputs, and outputs.
arXiv Detail & Related papers (2022-12-19T18:21:00Z) - Learning Instructions with Unlabeled Data for Zero-Shot Cross-Task
Generalization [68.91386402390403]
We propose Unlabeled Data Augmented Instruction Tuning (UDIT) to take better advantage of the instructions during instruction learning.
We conduct extensive experiments to show UDIT's effectiveness in various scenarios of tasks and datasets.
arXiv Detail & Related papers (2022-10-17T15:25:24Z) - ENT-DESC: Entity Description Generation by Exploring Knowledge Graph [53.03778194567752]
In practice, the input knowledge could be more than enough, since the output description may only cover the most significant knowledge.
We introduce a large-scale and challenging dataset to facilitate the study of such a practical scenario in KG-to-text.
We propose a multi-graph structure that is able to represent the original graph information more comprehensively.
arXiv Detail & Related papers (2020-04-30T14:16:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.