UniEX: An Effective and Efficient Framework for Unified Information
Extraction via a Span-extractive Perspective
- URL: http://arxiv.org/abs/2305.10306v3
- Date: Mon, 22 May 2023 05:25:33 GMT
- Title: UniEX: An Effective and Efficient Framework for Unified Information
Extraction via a Span-extractive Perspective
- Authors: Ping Yang, Junyu Lu, Ruyi Gan, Junjie Wang, Yuxiang Zhang, Jiaxing
Zhang, Pingjian Zhang
- Abstract summary: We propose a new paradigm for universal information extraction (IE) that is compatible with any schema format.
Our approach converts the text-based IE tasks as the token-pair problem, which uniformly disassembles all extraction targets.
Experiment results show that UniEX can outperform generative universal IE models in terms of performance and inference-speed.
- Score: 11.477764739452702
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We propose a new paradigm for universal information extraction (IE) that is
compatible with any schema format and applicable to a list of IE tasks, such as
named entity recognition, relation extraction, event extraction and sentiment
analysis. Our approach converts the text-based IE tasks as the token-pair
problem, which uniformly disassembles all extraction targets into joint span
detection, classification and association problems with a unified extractive
framework, namely UniEX. UniEX can synchronously encode schema-based prompt and
textual information, and collaboratively learn the generalized knowledge from
pre-defined information using the auto-encoder language models. We develop a
traffine attention mechanism to integrate heterogeneous factors including
tasks, labels and inside tokens, and obtain the extraction target via a scoring
matrix. Experiment results show that UniEX can outperform generative universal
IE models in terms of performance and inference-speed on $14$ benchmarks IE
datasets with the supervised setting. The state-of-the-art performance in
low-resource scenarios also verifies the transferability and effectiveness of
UniEX.
Related papers
- LOKE: Linked Open Knowledge Extraction for Automated Knowledge Graph
Construction [0.0]
We investigate the use of GPT models and prompt engineering for knowledge graph construction with the Wikidata knowledge graph.
We show that a well engineered prompt, paired with a naive entity linking approach (which we call LOKE-GPT) outperforms AllenAI's OpenIE 4 implementation on the OKE task.
arXiv Detail & Related papers (2023-11-15T20:57:44Z) - GIELLM: Japanese General Information Extraction Large Language Model
Utilizing Mutual Reinforcement Effect [0.0]
We introduce the General Information Extraction Large Language Model (GIELLM)
It integrates text Classification, Sentiment Analysis, Named Entity Recognition, Relation Extraction, and Event Extraction using a uniform input-output schema.
This innovation marks the first instance of a model simultaneously handling such a diverse array of IE subtasks.
arXiv Detail & Related papers (2023-11-12T13:30:38Z) - Instruct and Extract: Instruction Tuning for On-Demand Information
Extraction [86.29491354355356]
On-Demand Information Extraction aims to fulfill the personalized demands of real-world users.
We present a benchmark named InstructIE, inclusive of both automatically generated training data, as well as the human-annotated test set.
Building on InstructIE, we further develop an On-Demand Information Extractor, ODIE.
arXiv Detail & Related papers (2023-10-24T17:54:25Z) - Modeling Entities as Semantic Points for Visual Information Extraction
in the Wild [55.91783742370978]
We propose an alternative approach to precisely and robustly extract key information from document images.
We explicitly model entities as semantic points, i.e., center points of entities are enriched with semantic information describing the attributes and relationships of different entities.
The proposed method can achieve significantly enhanced performance on entity labeling and linking, compared with previous state-of-the-art models.
arXiv Detail & Related papers (2023-03-23T08:21:16Z) - Universal Information Extraction as Unified Semantic Matching [54.19974454019611]
We decouple information extraction into two abilities, structuring and conceptualizing, which are shared by different tasks and schemas.
Based on this paradigm, we propose to universally model various IE tasks with Unified Semantic Matching framework.
In this way, USM can jointly encode schema and input text, uniformly extract substructures in parallel, and controllably decode target structures on demand.
arXiv Detail & Related papers (2023-01-09T11:51:31Z) - Unified Structure Generation for Universal Information Extraction [58.89057387608414]
UIE can universally model different IE tasks, adaptively generate targeted structures, and collaboratively learn general IE abilities from different knowledge sources.
Experiments show that UIE achieved the state-of-the-art performance on 4 IE tasks, 13 datasets, and on all supervised, low-resource, and few-shot settings.
arXiv Detail & Related papers (2022-03-23T08:49:29Z) - Towards Robust Visual Information Extraction in Real World: New Dataset
and Novel Solution [30.438041837029875]
We propose a robust visual information extraction system (VIES) towards real-world scenarios.
VIES is a unified end-to-end trainable framework for simultaneous text detection, recognition and information extraction.
We construct a fully-annotated dataset called EPHOIE, which is the first Chinese benchmark for both text spotting and visual information extraction.
arXiv Detail & Related papers (2021-01-24T11:05:24Z) - Cross-Supervised Joint-Event-Extraction with Heterogeneous Information
Networks [61.950353376870154]
Joint-event-extraction is a sequence-to-sequence labeling task with a tag set composed of tags of triggers and entities.
We propose a Cross-Supervised Mechanism (CSM) to alternately supervise the extraction of triggers or entities.
Our approach outperforms the state-of-the-art methods in both entity and trigger extraction.
arXiv Detail & Related papers (2020-10-13T11:51:17Z) - SciREX: A Challenge Dataset for Document-Level Information Extraction [56.83748634747753]
It is challenging to create a large-scale information extraction dataset at the document level.
We introduce SciREX, a document level IE dataset that encompasses multiple IE tasks.
We develop a neural model as a strong baseline that extends previous state-of-the-art IE models to document-level IE.
arXiv Detail & Related papers (2020-05-01T17:30:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.