Key Information Extraction From Documents: Evaluation And Generator
- URL: http://arxiv.org/abs/2106.14624v1
- Date: Wed, 9 Jun 2021 16:12:21 GMT
- Title: Key Information Extraction From Documents: Evaluation And Generator
- Authors: Oliver Bensch, Mirela Popa and Constantin Spille
- Abstract summary: This research project compares state-of-the-art models for information extraction from documents.
The results have shown that NLP based pre-processing is beneficial for model performance.
The use of a bounding box regression decoder increases the model performance only for fields that do not follow a rectangular shape.
- Score: 3.878105750489656
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Extracting information from documents usually relies on natural language
processing methods working on one-dimensional sequences of text. In some cases,
for example, for the extraction of key information from semi-structured
documents, such as invoice-documents, spatial and formatting information of
text are crucial to understand the contextual meaning. Convolutional neural
networks are already common in computer vision models to process and extract
relationships in multidimensional data. Therefore, natural language processing
models have already been combined with computer vision models in the past, to
benefit from e.g. positional information and to improve performance of these
key information extraction models. Existing models were either trained on
unpublished data sets or on an annotated collection of receipts, which did not
focus on PDF-like documents. Hence, in this research project a template-based
document generator was created to compare state-of-the-art models for
information extraction. An existing information extraction model "Chargrid"
(Katti et al., 2019) was reconstructed and the impact of a bounding box
regression decoder, as well as the impact of an NLP pre-processing step was
evaluated for information extraction from documents. The results have shown
that NLP based pre-processing is beneficial for model performance. However, the
use of a bounding box regression decoder increases the model performance only
for fields that do not follow a rectangular shape.
Related papers
- Spatio-Temporal Side Tuning Pre-trained Foundation Models for Video-based Pedestrian Attribute Recognition [58.79807861739438]
Existing pedestrian recognition (PAR) algorithms are mainly developed based on a static image.
We propose to understand human attributes using video frames that can fully use temporal information.
arXiv Detail & Related papers (2024-04-27T14:43:32Z) - A LayoutLMv3-Based Model for Enhanced Relation Extraction in Visually-Rich Documents [0.0]
We present a model that can match or outperform the current state-of-the-art results in Relation Extraction (RE) applied to Visually-Rich Documents (VRD)
We also report an extensive ablation study performed on FUNSD, highlighting the great impact of certain features and modelization choices on the performances.
arXiv Detail & Related papers (2024-04-16T18:50:57Z) - Learning to Extract Structured Entities Using Language Models [52.281701191329]
Recent advances in machine learning have significantly impacted the field of information extraction.
We reformulate the task to be entity-centric, enabling the use of diverse metrics.
We contribute to the field by introducing Structured Entity Extraction and proposing the Approximate Entity Set OverlaP metric.
arXiv Detail & Related papers (2024-02-06T22:15:09Z) - RegaVAE: A Retrieval-Augmented Gaussian Mixture Variational Auto-Encoder
for Language Modeling [79.56442336234221]
We introduce RegaVAE, a retrieval-augmented language model built upon the variational auto-encoder (VAE)
It encodes the text corpus into a latent space, capturing current and future information from both source and target text.
Experimental results on various datasets demonstrate significant improvements in text generation quality and hallucination removal.
arXiv Detail & Related papers (2023-10-16T16:42:01Z) - Extensive Evaluation of Transformer-based Architectures for Adverse Drug
Events Extraction [6.78974856327994]
Adverse Event (ADE) extraction is one of the core tasks in digital pharmacovigilance.
We evaluate 19 Transformer-based models for ADE extraction on informal texts.
At the end of our analyses, we identify a list of take-home messages that can be derived from the experimental data.
arXiv Detail & Related papers (2023-06-08T15:25:24Z) - ClipCrop: Conditioned Cropping Driven by Vision-Language Model [90.95403416150724]
We take advantage of vision-language models as a foundation for creating robust and user-intentional cropping algorithms.
We develop a method to perform cropping with a text or image query that reflects the user's intention as guidance.
Our pipeline design allows the model to learn text-conditioned aesthetic cropping with a small dataset.
arXiv Detail & Related papers (2022-11-21T14:27:07Z) - DORE: Document Ordered Relation Extraction based on Generative Framework [56.537386636819626]
This paper investigates the root cause of the underwhelming performance of the existing generative DocRE models.
We propose to generate a symbolic and ordered sequence from the relation matrix which is deterministic and easier for model to learn.
Experimental results on four datasets show that our proposed method can improve the performance of the generative DocRE models.
arXiv Detail & Related papers (2022-10-28T11:18:10Z) - Spatial Dual-Modality Graph Reasoning for Key Information Extraction [31.04597531115209]
We propose an end-to-end Spatial Dual-Modality Graph Reasoning method (SDMG-R) to extract key information from unstructured document images.
We release a new dataset named WildReceipt, which is collected and annotated for the evaluation of key information extraction from document images of unseen templates in the wild.
arXiv Detail & Related papers (2021-03-26T13:46:00Z) - OCR Graph Features for Manipulation Detection in Documents [11.193867567895353]
We propose a model that leverages graph features using OCR (Optical Character Recognition)
Our model relies on a data-driven approach to detect alterations by training a random forest classifier on the graph-based OCR features.
We evaluate our algorithm's forgery detection performance on dataset constructed from real business documents with slight forgery imperfections.
arXiv Detail & Related papers (2020-09-10T21:50:45Z) - Pre-training for Abstractive Document Summarization by Reinstating
Source Text [105.77348528847337]
This paper presents three pre-training objectives which allow us to pre-train a Seq2Seq based abstractive summarization model on unlabeled text.
Experiments on two benchmark summarization datasets show that all three objectives can improve performance upon baselines.
arXiv Detail & Related papers (2020-04-04T05:06:26Z) - Abstractive Text Summarization based on Language Model Conditioning and
Locality Modeling [4.525267347429154]
We train a Transformer-based neural model on the BERT language model.
In addition, we propose a new method of BERT-windowing, which allows chunk-wise processing of texts longer than the BERT window size.
The results of our models are compared to a baseline and the state-of-the-art models on the CNN/Daily Mail dataset.
arXiv Detail & Related papers (2020-03-29T14:00:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.