PPN: Parallel Pointer-based Network for Key Information Extraction with
Complex Layouts
- URL: http://arxiv.org/abs/2307.10551v1
- Date: Thu, 20 Jul 2023 03:29:09 GMT
- Title: PPN: Parallel Pointer-based Network for Key Information Extraction with
Complex Layouts
- Authors: Kaiwen Wei, Jie Yao, Jingyuan Zhang, Yangyang Kang, Fubang Zhao,
Yating Zhang, Changlong Sun, Xin Jin, Xin Zhang
- Abstract summary: Key Information Extraction is a challenging task that aims to extract structured value semantic entities from documents.
Existing methods follow a two-stage pipeline strategy, which may lead to the error propagation problem.
We introduce Parallel Pointer-based Network (PPN), an end-to-end model that can be applied in zero-shot and few-shot scenarios.
- Score: 29.73609439825548
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Key Information Extraction (KIE) is a challenging multimodal task that aims
to extract structured value semantic entities from visually rich documents.
Although significant progress has been made, there are still two major
challenges that need to be addressed. Firstly, the layout of existing datasets
is relatively fixed and limited in the number of semantic entity categories,
creating a significant gap between these datasets and the complex real-world
scenarios. Secondly, existing methods follow a two-stage pipeline strategy,
which may lead to the error propagation problem. Additionally, they are
difficult to apply in situations where unseen semantic entity categories
emerge. To address the first challenge, we propose a new large-scale
human-annotated dataset named Complex Layout form for key information
EXtraction (CLEX), which consists of 5,860 images with 1,162 semantic entity
categories. To solve the second challenge, we introduce Parallel Pointer-based
Network (PPN), an end-to-end model that can be applied in zero-shot and
few-shot scenarios. PPN leverages the implicit clues between semantic entities
to assist extracting, and its parallel extraction mechanism allows it to
extract multiple results simultaneously and efficiently. Experiments on the
CLEX dataset demonstrate that PPN outperforms existing state-of-the-art methods
while also offering a much faster inference speed.
Related papers
- Generative Retrieval Meets Multi-Graded Relevance [104.75244721442756]
We introduce a framework called GRaded Generative Retrieval (GR$2$)
GR$2$ focuses on two key components: ensuring relevant and distinct identifiers, and implementing multi-graded constrained contrastive training.
Experiments on datasets with both multi-graded and binary relevance demonstrate the effectiveness of GR$2$.
arXiv Detail & Related papers (2024-09-27T02:55:53Z) - ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling [53.97609687516371]
We propose a pioneering generAtive Cross-modal rEtrieval framework (ACE) for end-to-end cross-modal retrieval.
ACE achieves state-of-the-art performance in cross-modal retrieval and outperforms the strong baselines on Recall@1 by 15.27% on average.
arXiv Detail & Related papers (2024-06-25T12:47:04Z) - EnriCo: Enriched Representation and Globally Constrained Inference for Entity and Relation Extraction [3.579132482505273]
Joint entity and relation extraction plays a pivotal role in various applications, notably in the construction of knowledge graphs.
Existing approaches often fall short of two key aspects: richness of representation and coherence in output structure.
In our work, we introduce EnriCo, which mitigates these shortcomings.
arXiv Detail & Related papers (2024-04-18T20:15:48Z) - LaSagnA: Language-based Segmentation Assistant for Complex Queries [39.620806493454616]
Large Language Models for Vision (vLLMs) generate detailed perceptual outcomes, including bounding boxes and masks.
In this study, we acknowledge that the main cause of these problems is the insufficient complexity of training queries.
We present three novel strategies to effectively handle the challenges arising from the direct integration of the proposed format.
arXiv Detail & Related papers (2024-04-12T14:40:45Z) - PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking for End-to-end Document Pair Extraction [28.205723817300576]
Document pair extraction aims to identify key and value entities as well as their relationships from visually-rich documents.
Most existing methods divide it into two separate tasks: semantic entity recognition (SER) and relation extraction (RE)
This paper introduces a novel framework, PEneo, which performs document pair extraction in a unified pipeline.
arXiv Detail & Related papers (2024-01-07T12:48:07Z) - Efficient and Effective Deep Multi-view Subspace Clustering [9.6753782215283]
We propose a novel deep framework, termed Efficient and Effective deep Multi-View Subspace Clustering (E$2$MVSC)
Instead of a parameterized FC layer, we design a Relation-Metric Net that decouples network parameter scale from sample numbers for greater computational efficiency.
E$2$MVSC yields comparable results to existing methods and achieves state-of-the-art performance in various types of multi-view datasets.
arXiv Detail & Related papers (2023-10-15T03:08:25Z) - A Unified One-Step Solution for Aspect Sentiment Quad Prediction [3.428123050377681]
Aspect sentiment quad prediction (ASQP) is a challenging yet significant subtask in aspect-based sentiment analysis.
We release two new datasets for ASQP, which contain the following characteristics: larger size, more words per sample, and higher density.
We propose a unified one-step solution for ASQP, namely One-ASQP, to detect the aspect categories and to identify the aspect-opinion-sentiment triplets simultaneously.
arXiv Detail & Related papers (2023-06-07T05:00:01Z) - ReSel: N-ary Relation Extraction from Scientific Text and Tables by
Learning to Retrieve and Select [53.071352033539526]
We study the problem of extracting N-ary relations from scientific articles.
Our proposed method ReSel decomposes this task into a two-stage procedure.
Our experiments on three scientific information extraction datasets show that ReSel outperforms state-of-the-art baselines significantly.
arXiv Detail & Related papers (2022-10-26T02:28:02Z) - X2Parser: Cross-Lingual and Cross-Domain Framework for Task-Oriented
Compositional Semantic Parsing [51.81533991497547]
Task-oriented compositional semantic parsing (TCSP) handles complex nested user queries.
We present X2 compared a transferable Cross-lingual and Cross-domain for TCSP.
We propose to predict flattened intents and slots representations separately and cast both prediction tasks into sequence labeling problems.
arXiv Detail & Related papers (2021-06-07T16:40:05Z) - Data Augmentation for Abstractive Query-Focused Multi-Document
Summarization [129.96147867496205]
We present two QMDS training datasets, which we construct using two data augmentation methods.
These two datasets have complementary properties, i.e., QMDSCNN has real summaries but queries are simulated, while QMDSIR has real queries but simulated summaries.
We build end-to-end neural network models on the combined datasets that yield new state-of-the-art transfer results on DUC datasets.
arXiv Detail & Related papers (2021-03-02T16:57:01Z) - Cross-Supervised Joint-Event-Extraction with Heterogeneous Information
Networks [61.950353376870154]
Joint-event-extraction is a sequence-to-sequence labeling task with a tag set composed of tags of triggers and entities.
We propose a Cross-Supervised Mechanism (CSM) to alternately supervise the extraction of triggers or entities.
Our approach outperforms the state-of-the-art methods in both entity and trigger extraction.
arXiv Detail & Related papers (2020-10-13T11:51:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.