Information Extraction from Visually Rich Documents with Font Style
Embeddings
- URL: http://arxiv.org/abs/2111.04045v1
- Date: Sun, 7 Nov 2021 10:29:54 GMT
- Title: Information Extraction from Visually Rich Documents with Font Style
Embeddings
- Authors: Ismail Oussaid, William Vanhuffel, Pirashanth Ratnamogan, Mhamed
Hajaiej, Alexis Mathey, Thomas Gilles
- Abstract summary: We propose to challenge the usage of computer vision in the case where both token style and visual representation are available.
Our experiments on three real-world complex datasets demonstrate that using token style attributes based embedding instead of a raw visual embedding is beneficial.
- Score: 0.6291443816903801
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Information extraction (IE) from documents is an intensive area of research
with a large set of industrial applications. Current state-of-the-art methods
focus on scanned documents with approaches combining computer vision, natural
language processing and layout representation. We propose to challenge the
usage of computer vision in the case where both token style and visual
representation are available (i.e native PDF documents). Our experiments on
three real-world complex datasets demonstrate that using token style attributes
based embedding instead of a raw visual embedding in LayoutLM model is
beneficial. Depending on the dataset, such an embedding yields an improvement
of 0.18% to 2.29% in the weighted F1-score with a decrease of 30.7% in the
final number of trainable parameters of the model, leading to an improvement in
both efficiency and effectiveness.
Related papers
- Vision-Enhanced Semantic Entity Recognition in Document Images via
Visually-Asymmetric Consistency Learning [19.28860833813788]
Existing models commonly train a visual encoder with weak cross-modal supervision signals.
We propose a novel textbfVisually-textbfAsymmetric cotextbfNsistentextbfCy textbfLearning (textscVancl) approach to capture fine-grained visual and layout features.
arXiv Detail & Related papers (2023-10-23T10:37:22Z) - Unveiling Document Structures with YOLOv5 Layout Detection [0.0]
This research investigates the utilization of YOLOv5, a cutting-edge computer vision model, for the purpose of rapidly identifying document layouts and extracting unstructured data.
The main objective is to create an autonomous system that can effectively recognize document layouts and extract unstructured data.
arXiv Detail & Related papers (2023-09-29T07:45:10Z) - infoVerse: A Universal Framework for Dataset Characterization with
Multidimensional Meta-information [68.76707843019886]
infoVerse is a universal framework for dataset characterization.
infoVerse captures multidimensional characteristics of datasets by incorporating various model-driven meta-information.
In three real-world applications (data pruning, active learning, and data annotation), the samples chosen on infoVerse space consistently outperform strong baselines.
arXiv Detail & Related papers (2023-05-30T18:12:48Z) - Visual Information Extraction in the Wild: Practical Dataset and
End-to-end Solution [48.693941280097974]
We propose a large-scale dataset consisting of camera images for visual information extraction (VIE)
We propose a novel framework for end-to-end VIE that combines the stages of OCR and information extraction in an end-to-end learning fashion.
We evaluate the existing end-to-end methods for VIE on the proposed dataset and observe that the performance of these methods has a distinguishable drop from SROIE to our proposed dataset due to the larger variance of layout and entities.
arXiv Detail & Related papers (2023-05-12T14:11:47Z) - Improving Image Recognition by Retrieving from Web-Scale Image-Text Data [68.63453336523318]
We introduce an attention-based memory module, which learns the importance of each retrieved example from the memory.
Compared to existing approaches, our method removes the influence of the irrelevant retrieved examples, and retains those that are beneficial to the input query.
We show that it achieves state-of-the-art accuracies in ImageNet-LT, Places-LT and Webvision datasets.
arXiv Detail & Related papers (2023-04-11T12:12:05Z) - Modeling Entities as Semantic Points for Visual Information Extraction
in the Wild [55.91783742370978]
We propose an alternative approach to precisely and robustly extract key information from document images.
We explicitly model entities as semantic points, i.e., center points of entities are enriched with semantic information describing the attributes and relationships of different entities.
The proposed method can achieve significantly enhanced performance on entity labeling and linking, compared with previous state-of-the-art models.
arXiv Detail & Related papers (2023-03-23T08:21:16Z) - One-shot Key Information Extraction from Document with Deep Partial
Graph Matching [60.48651298832829]
Key Information Extraction (KIE) from documents improves efficiency, productivity, and security in many industrial scenarios.
Existing supervised learning methods for the KIE task need to feed a large number of labeled samples and learn separate models for different types of documents.
We propose a deep end-to-end trainable network for one-shot KIE using partial graph matching.
arXiv Detail & Related papers (2021-09-26T07:45:53Z) - Towards Robust Visual Information Extraction in Real World: New Dataset
and Novel Solution [30.438041837029875]
We propose a robust visual information extraction system (VIES) towards real-world scenarios.
VIES is a unified end-to-end trainable framework for simultaneous text detection, recognition and information extraction.
We construct a fully-annotated dataset called EPHOIE, which is the first Chinese benchmark for both text spotting and visual information extraction.
arXiv Detail & Related papers (2021-01-24T11:05:24Z) - LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document
Understanding [49.941806975280045]
Pre-training of text and layout has proved effective in a variety of visually-rich document understanding tasks.
We present text-bfLMv2 by pre-training text, layout and image in a multi-modal framework.
arXiv Detail & Related papers (2020-12-29T13:01:52Z) - Robust Layout-aware IE for Visually Rich Documents with Pre-trained
Language Models [23.42593796135709]
We study the problem of information extraction from visually rich documents (VRDs)
We present a model that combines the power of large pre-trained language models and graph neural networks to efficiently encode both textual and visual information in business documents.
arXiv Detail & Related papers (2020-05-22T06:04:50Z) - PICK: Processing Key Information Extraction from Documents using
Improved Graph Learning-Convolutional Networks [5.210482046387142]
Key Information Extraction from documents remains a challenge.
We introduce PICK, a framework that is effective and robust in handling complex documents layout for KIE.
Our method outperforms baselines methods by significant margins.
arXiv Detail & Related papers (2020-04-16T05:20:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.