GroupLink: An End-to-end Multitask Method for Word Grouping and Relation
Extraction in Form Understanding
- URL: http://arxiv.org/abs/2105.04650v1
- Date: Mon, 10 May 2021 20:15:06 GMT
- Title: GroupLink: An End-to-end Multitask Method for Word Grouping and Relation
Extraction in Form Understanding
- Authors: Zilong Wang, Mingjie Zhan, Houxing Ren, Zhaohui Hou, Yuwei Wu, Xingyan
Zhang, Ding Liang
- Abstract summary: We build an end-to-end model through multitask training to combine word grouping and relation extraction to enhance performance on each task.
We validate our proposed method on a real-world, fully-annotated, noisy-scanned benchmark, FUNSD.
- Score: 25.71040852477277
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Forms are a common type of document in real life and carry rich information
through textual contents and the organizational structure. To realize automatic
processing of forms, word grouping and relation extraction are two fundamental
and crucial steps after preliminary processing of optical character reader
(OCR). Word grouping is to aggregate words that belong to the same semantic
entity, and relation extraction is to predict the links between semantic
entities. Existing works treat them as two individual tasks, but these two
tasks are correlated and can reinforce each other. The grouping process will
refine the integrated representation of the corresponding entity, and the
linking process will give feedback to the grouping performance. For this
purpose, we acquire multimodal features from both textual data and layout
information and build an end-to-end model through multitask training to combine
word grouping and relation extraction to enhance performance on each task. We
validate our proposed method on a real-world, fully-annotated, noisy-scanned
benchmark, FUNSD, and extensive experiments demonstrate the effectiveness of
our method.
Related papers
- Unified Multi-Modal Interleaved Document Representation for Information Retrieval [57.65409208879344]
We produce more comprehensive and nuanced document representations by holistically embedding documents interleaved with different modalities.
Specifically, we achieve this by leveraging the capability of recent vision-language models that enable the processing and integration of text, images, and tables into a unified format and representation.
arXiv Detail & Related papers (2024-10-03T17:49:09Z) - ReSel: N-ary Relation Extraction from Scientific Text and Tables by
Learning to Retrieve and Select [53.071352033539526]
We study the problem of extracting N-ary relations from scientific articles.
Our proposed method ReSel decomposes this task into a two-stage procedure.
Our experiments on three scientific information extraction datasets show that ReSel outperforms state-of-the-art baselines significantly.
arXiv Detail & Related papers (2022-10-26T02:28:02Z) - Multi-grained Label Refinement Network with Dependency Structures for
Joint Intent Detection and Slot Filling [13.963083174197164]
intent and semantic components of a utterance are dependent on the syntactic elements of a sentence.
In this paper, we investigate a multi-grained label refinement network, which utilizes dependency structures and label semantic embeddings.
Considering to enhance syntactic representations, we introduce the dependency structures of sentences into our model by graph attention layer.
arXiv Detail & Related papers (2022-09-09T07:27:38Z) - TRIE++: Towards End-to-End Information Extraction from Visually Rich
Documents [51.744527199305445]
This paper proposes a unified end-to-end information extraction framework from visually rich documents.
Text reading and information extraction can reinforce each other via a well-designed multi-modal context block.
The framework can be trained in an end-to-end trainable manner, achieving global optimization.
arXiv Detail & Related papers (2022-07-14T08:52:07Z) - TAGPRIME: A Unified Framework for Relational Structure Extraction [71.88926365652034]
TAGPRIME is a sequence tagging model that appends priming words about the information of the given condition to the input text.
With the self-attention mechanism in pre-trained language models, the priming words make the output contextualized representations contain more information about the given condition.
Extensive experiments and analyses on three different tasks that cover ten datasets across five different languages demonstrate the generality and effectiveness of TAGPRIME.
arXiv Detail & Related papers (2022-05-25T08:57:46Z) - Improving Multi-task Generalization Ability for Neural Text Matching via
Prompt Learning [54.66399120084227]
Recent state-of-the-art neural text matching models (PLMs) are hard to generalize to different tasks.
We adopt a specialization-generalization training strategy and refer to it as Match-Prompt.
In specialization stage, descriptions of different matching tasks are mapped to only a few prompt tokens.
In generalization stage, text matching model explores the essential matching signals by being trained on diverse multiple matching tasks.
arXiv Detail & Related papers (2022-04-06T11:01:08Z) - Divide and Conquer: Text Semantic Matching with Disentangled Keywords
and Intents [19.035917264711664]
We propose a training strategy for text semantic matching by disentangling keywords from intents.
Our approach can be easily combined with pre-trained language models (PLM) without influencing their inference efficiency.
arXiv Detail & Related papers (2022-03-06T07:48:24Z) - DocStruct: A Multimodal Method to Extract Hierarchy Structure in
Document for General Form Understanding [15.814603044233085]
We focus on the most elementary components, the key-value pairs, and adopt multimodal methods to extract features.
We utilize the state-of-the-art models and design targeted extraction modules to extract multimodal features.
A hybrid fusion method of concatenation and feature shifting is designed to fuse the heterogeneous features and provide an informative joint representation.
arXiv Detail & Related papers (2020-10-15T08:54:17Z) - Extractive Summarization as Text Matching [123.09816729675838]
This paper creates a paradigm shift with regard to the way we build neural extractive summarization systems.
We formulate the extractive summarization task as a semantic text matching problem.
We have driven the state-of-the-art extractive result on CNN/DailyMail to a new level (44.41 in ROUGE-1)
arXiv Detail & Related papers (2020-04-19T08:27:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.