Spatial Dual-Modality Graph Reasoning for Key Information Extraction
- URL: http://arxiv.org/abs/2103.14470v1
- Date: Fri, 26 Mar 2021 13:46:00 GMT
- Title: Spatial Dual-Modality Graph Reasoning for Key Information Extraction
- Authors: Hongbin Sun, Zhanghui Kuang, Xiaoyu Yue, Chenhao Lin and Wayne Zhang
- Abstract summary: We propose an end-to-end Spatial Dual-Modality Graph Reasoning method (SDMG-R) to extract key information from unstructured document images.
We release a new dataset named WildReceipt, which is collected and annotated for the evaluation of key information extraction from document images of unseen templates in the wild.
- Score: 31.04597531115209
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Key information extraction from document images is of paramount importance in
office automation. Conventional template matching based approaches fail to
generalize well to document images of unseen templates, and are not robust
against text recognition errors. In this paper, we propose an end-to-end
Spatial Dual-Modality Graph Reasoning method (SDMG-R) to extract key
information from unstructured document images. We model document images as
dual-modality graphs, nodes of which encode both the visual and textual
features of detected text regions, and edges of which represent the spatial
relations between neighboring text regions. The key information extraction is
solved by iteratively propagating messages along graph edges and reasoning the
categories of graph nodes. In order to roundly evaluate our proposed method as
well as boost the future research, we release a new dataset named WildReceipt,
which is collected and annotated tailored for the evaluation of key information
extraction from document images of unseen templates in the wild. It contains 25
key information categories, a total of about 69000 text boxes, and is about 2
times larger than the existing public datasets. Extensive experiments validate
that all information including visual features, textual features and spatial
relations can benefit key information extraction. It has been shown that SDMG-R
can effectively extract key information from document images of unseen
templates, and obtain new state-of-the-art results on the recent popular
benchmark SROIE and our WildReceipt. Our code and dataset will be publicly
released.
Related papers
- See then Tell: Enhancing Key Information Extraction with Vision Grounding [54.061203106565706]
We introduce STNet (See then Tell Net), a novel end-to-end model designed to deliver precise answers with relevant vision grounding.
To enhance the model's seeing capabilities, we collect extensive structured table recognition datasets.
arXiv Detail & Related papers (2024-09-29T06:21:05Z) - DUBLIN -- Document Understanding By Language-Image Network [37.42637168606938]
We propose DUBLIN, which is pretrained on web pages using three novel objectives.
We show that DUBLIN is the first pixel-based model to achieve an EM of 77.75 and F1 of 84.25 on the WebSRC dataset.
We also achieve competitive performance on RVL-CDIP document classification.
arXiv Detail & Related papers (2023-05-23T16:34:09Z) - SelfDocSeg: A Self-Supervised vision-based Approach towards Document
Segmentation [15.953725529361874]
Document layout analysis is a known problem to the documents research community.
With growing internet connectivity to personal life, an enormous amount of documents had been available in the public domain.
We address this challenge using self-supervision and unlike, the few existing self-supervised document segmentation approaches.
arXiv Detail & Related papers (2023-05-01T12:47:55Z) - Modeling Entities as Semantic Points for Visual Information Extraction
in the Wild [55.91783742370978]
We propose an alternative approach to precisely and robustly extract key information from document images.
We explicitly model entities as semantic points, i.e., center points of entities are enriched with semantic information describing the attributes and relationships of different entities.
The proposed method can achieve significantly enhanced performance on entity labeling and linking, compared with previous state-of-the-art models.
arXiv Detail & Related papers (2023-03-23T08:21:16Z) - Pattern Spotting and Image Retrieval in Historical Documents using Deep
Hashing [60.67014034968582]
This paper presents a deep learning approach for image retrieval and pattern spotting in digital collections of historical documents.
Deep learning models are used for feature extraction, considering two distinct variants, which provide either real-valued or binary code representations.
The proposed approach also reduces the search time by up to 200x and the storage cost up to 6,000x when compared to related works.
arXiv Detail & Related papers (2022-08-04T01:39:37Z) - TRIE++: Towards End-to-End Information Extraction from Visually Rich
Documents [51.744527199305445]
This paper proposes a unified end-to-end information extraction framework from visually rich documents.
Text reading and information extraction can reinforce each other via a well-designed multi-modal context block.
The framework can be trained in an end-to-end trainable manner, achieving global optimization.
arXiv Detail & Related papers (2022-07-14T08:52:07Z) - RDU: A Region-based Approach to Form-style Document Understanding [69.29541701576858]
Key Information Extraction (KIE) is aimed at extracting structured information from form-style documents.
We develop a new KIE model named Region-based Understanding Document (RDU)
RDU takes as input the text content and corresponding coordinates of a document, and tries to predict the result by localizing a bounding-box-like region.
arXiv Detail & Related papers (2022-06-14T14:47:48Z) - One-shot Key Information Extraction from Document with Deep Partial
Graph Matching [60.48651298832829]
Key Information Extraction (KIE) from documents improves efficiency, productivity, and security in many industrial scenarios.
Existing supervised learning methods for the KIE task need to feed a large number of labeled samples and learn separate models for different types of documents.
We propose a deep end-to-end trainable network for one-shot KIE using partial graph matching.
arXiv Detail & Related papers (2021-09-26T07:45:53Z) - Key Information Extraction From Documents: Evaluation And Generator [3.878105750489656]
This research project compares state-of-the-art models for information extraction from documents.
The results have shown that NLP based pre-processing is beneficial for model performance.
The use of a bounding box regression decoder increases the model performance only for fields that do not follow a rectangular shape.
arXiv Detail & Related papers (2021-06-09T16:12:21Z) - Towards Robust Visual Information Extraction in Real World: New Dataset
and Novel Solution [30.438041837029875]
We propose a robust visual information extraction system (VIES) towards real-world scenarios.
VIES is a unified end-to-end trainable framework for simultaneous text detection, recognition and information extraction.
We construct a fully-annotated dataset called EPHOIE, which is the first Chinese benchmark for both text spotting and visual information extraction.
arXiv Detail & Related papers (2021-01-24T11:05:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.