Towards Robust Visual Information Extraction in Real World: New Dataset
and Novel Solution
- URL: http://arxiv.org/abs/2102.06732v1
- Date: Sun, 24 Jan 2021 11:05:24 GMT
- Title: Towards Robust Visual Information Extraction in Real World: New Dataset
and Novel Solution
- Authors: Jiapeng Wang, Chongyu Liu, Lianwen Jin, Guozhi Tang, Jiaxin Zhang,
Shuaitao Zhang, Qianying Wang, Yaqiang Wu, Mingxiang Cai
- Abstract summary: We propose a robust visual information extraction system (VIES) towards real-world scenarios.
VIES is a unified end-to-end trainable framework for simultaneous text detection, recognition and information extraction.
We construct a fully-annotated dataset called EPHOIE, which is the first Chinese benchmark for both text spotting and visual information extraction.
- Score: 30.438041837029875
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual information extraction (VIE) has attracted considerable attention
recently owing to its various advanced applications such as document
understanding, automatic marking and intelligent education. Most existing works
decoupled this problem into several independent sub-tasks of text spotting
(text detection and recognition) and information extraction, which completely
ignored the high correlation among them during optimization. In this paper, we
propose a robust visual information extraction system (VIES) towards real-world
scenarios, which is a unified end-to-end trainable framework for simultaneous
text detection, recognition and information extraction by taking a single
document image as input and outputting the structured information.
Specifically, the information extraction branch collects abundant visual and
semantic representations from text spotting for multimodal feature fusion and
conversely, provides higher-level semantic clues to contribute to the
optimization of text spotting. Moreover, regarding the shortage of public
benchmarks, we construct a fully-annotated dataset called EPHOIE
(https://github.com/HCIILAB/EPHOIE), which is the first Chinese benchmark for
both text spotting and visual information extraction. EPHOIE consists of 1,494
images of examination paper head with complex layouts and background, including
a total of 15,771 Chinese handwritten or printed text instances. Compared with
the state-of-the-art methods, our VIES shows significant superior performance
on the EPHOIE dataset and achieves a 9.01% F-score gain on the widely used
SROIE dataset under the end-to-end scenario.
Related papers
- Towards Unified Multi-granularity Text Detection with Interactive Attention [56.79437272168507]
"Detect Any Text" is an advanced paradigm that unifies scene text detection, layout analysis, and document page detection into a cohesive, end-to-end model.
A pivotal innovation in DAT is the across-granularity interactive attention module, which significantly enhances the representation learning of text instances.
Tests demonstrate that DAT achieves state-of-the-art performances across a variety of text-related benchmarks.
arXiv Detail & Related papers (2024-05-30T07:25:23Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - Visual Information Extraction in the Wild: Practical Dataset and
End-to-end Solution [48.693941280097974]
We propose a large-scale dataset consisting of camera images for visual information extraction (VIE)
We propose a novel framework for end-to-end VIE that combines the stages of OCR and information extraction in an end-to-end learning fashion.
We evaluate the existing end-to-end methods for VIE on the proposed dataset and observe that the performance of these methods has a distinguishable drop from SROIE to our proposed dataset due to the larger variance of layout and entities.
arXiv Detail & Related papers (2023-05-12T14:11:47Z) - TRIE++: Towards End-to-End Information Extraction from Visually Rich
Documents [51.744527199305445]
This paper proposes a unified end-to-end information extraction framework from visually rich documents.
Text reading and information extraction can reinforce each other via a well-designed multi-modal context block.
The framework can be trained in an end-to-end trainable manner, achieving global optimization.
arXiv Detail & Related papers (2022-07-14T08:52:07Z) - One-shot Key Information Extraction from Document with Deep Partial
Graph Matching [60.48651298832829]
Key Information Extraction (KIE) from documents improves efficiency, productivity, and security in many industrial scenarios.
Existing supervised learning methods for the KIE task need to feed a large number of labeled samples and learn separate models for different types of documents.
We propose a deep end-to-end trainable network for one-shot KIE using partial graph matching.
arXiv Detail & Related papers (2021-09-26T07:45:53Z) - TRIE: End-to-End Text Reading and Information Extraction for Document
Understanding [56.1416883796342]
We propose a unified end-to-end text reading and information extraction network.
multimodal visual and textual features of text reading are fused for information extraction.
Our proposed method significantly outperforms the state-of-the-art methods in both efficiency and accuracy.
arXiv Detail & Related papers (2020-05-27T01:47:26Z) - PICK: Processing Key Information Extraction from Documents using
Improved Graph Learning-Convolutional Networks [5.210482046387142]
Key Information Extraction from documents remains a challenge.
We introduce PICK, a framework that is effective and robust in handling complex documents layout for KIE.
Our method outperforms baselines methods by significant margins.
arXiv Detail & Related papers (2020-04-16T05:20:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.