TRIE: End-to-End Text Reading and Information Extraction for Document
Understanding
- URL: http://arxiv.org/abs/2005.13118v3
- Date: Mon, 25 Oct 2021 09:33:53 GMT
- Title: TRIE: End-to-End Text Reading and Information Extraction for Document
Understanding
- Authors: Peng Zhang, Yunlu Xu, Zhanzhan Cheng, Shiliang Pu, Jing Lu, Liang
Qiao, Yi Niu, and Fei Wu
- Abstract summary: We propose a unified end-to-end text reading and information extraction network.
multimodal visual and textual features of text reading are fused for information extraction.
Our proposed method significantly outperforms the state-of-the-art methods in both efficiency and accuracy.
- Score: 56.1416883796342
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Since real-world ubiquitous documents (e.g., invoices, tickets, resumes and
leaflets) contain rich information, automatic document image understanding has
become a hot topic. Most existing works decouple the problem into two separate
tasks, (1) text reading for detecting and recognizing texts in images and (2)
information extraction for analyzing and extracting key elements from
previously extracted plain text. However, they mainly focus on improving
information extraction task, while neglecting the fact that text reading and
information extraction are mutually correlated. In this paper, we propose a
unified end-to-end text reading and information extraction network, where the
two tasks can reinforce each other. Specifically, the multimodal visual and
textual features of text reading are fused for information extraction and in
turn, the semantics in information extraction contribute to the optimization of
text reading. On three real-world datasets with diverse document images (from
fixed layout to variable layout, from structured text to semi-structured text),
our proposed method significantly outperforms the state-of-the-art methods in
both efficiency and accuracy.
Related papers
- Towards Unified Multi-granularity Text Detection with Interactive Attention [56.79437272168507]
"Detect Any Text" is an advanced paradigm that unifies scene text detection, layout analysis, and document page detection into a cohesive, end-to-end model.
A pivotal innovation in DAT is the across-granularity interactive attention module, which significantly enhances the representation learning of text instances.
Tests demonstrate that DAT achieves state-of-the-art performances across a variety of text-related benchmarks.
arXiv Detail & Related papers (2024-05-30T07:25:23Z) - Towards Improving Document Understanding: An Exploration on
Text-Grounding via MLLMs [96.54224331778195]
We present a text-grounding document understanding model, termed TGDoc, which enhances MLLMs with the ability to discern the spatial positioning of text within images.
We formulate instruction tuning tasks including text detection, recognition, and spotting to facilitate the cohesive alignment between the visual encoder and large language model.
Our method achieves state-of-the-art performance across multiple text-rich benchmarks, validating the effectiveness of our method.
arXiv Detail & Related papers (2023-11-22T06:46:37Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - TRIE++: Towards End-to-End Information Extraction from Visually Rich
Documents [51.744527199305445]
This paper proposes a unified end-to-end information extraction framework from visually rich documents.
Text reading and information extraction can reinforce each other via a well-designed multi-modal context block.
The framework can be trained in an end-to-end trainable manner, achieving global optimization.
arXiv Detail & Related papers (2022-07-14T08:52:07Z) - Language Matters: A Weakly Supervised Pre-training Approach for Scene
Text Detection and Spotting [69.77701325270047]
This paper presents a weakly supervised pre-training method that can acquire effective scene text representations.
Our network consists of an image encoder and a character-aware text encoder that extract visual and textual features.
Experiments show that our pre-trained model improves F-score by +2.5% and +4.8% while transferring its weights to other text detection and spotting networks.
arXiv Detail & Related papers (2022-03-08T08:10:45Z) - DUET: Detection Utilizing Enhancement for Text in Scanned or Captured
Documents [1.4866448722906016]
Our proposed model is designed to perform noise reduction and text region enhancement as well as text detection.
We enrich the training data for the model with synthesized document images that are fully labeled for text detection and enhancement.
Our methods are demonstrated in a real document dataset with performances exceeding those of other text detection methods.
arXiv Detail & Related papers (2021-06-10T07:08:31Z) - Towards Robust Visual Information Extraction in Real World: New Dataset
and Novel Solution [30.438041837029875]
We propose a robust visual information extraction system (VIES) towards real-world scenarios.
VIES is a unified end-to-end trainable framework for simultaneous text detection, recognition and information extraction.
We construct a fully-annotated dataset called EPHOIE, which is the first Chinese benchmark for both text spotting and visual information extraction.
arXiv Detail & Related papers (2021-01-24T11:05:24Z) - Matching Text with Deep Mutual Information Estimation [0.0]
We present a neural approach for general-purpose text matching with deep mutual information estimation incorporated.
Our approach, Text matching with Deep Info Max (TIM), is integrated with a procedure of unsupervised learning of representations.
We evaluate our text matching approach on several tasks including natural language inference, paraphrase identification, and answer selection.
arXiv Detail & Related papers (2020-03-09T15:25:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.