DocParser: End-to-end OCR-free Information Extraction from Visually Rich
Documents
- URL: http://arxiv.org/abs/2304.12484v2
- Date: Mon, 1 May 2023 21:09:08 GMT
- Title: DocParser: End-to-end OCR-free Information Extraction from Visually Rich
Documents
- Authors: Mohamed Dhouib, Ghassen Bettaieb and Aymen Shabou
- Abstract summary: OCR-free end-to-end information extraction model named Docrimi.
Recent OCR-free end-to-end information extraction model named Docrimi.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Information Extraction from visually rich documents is a challenging task
that has gained a lot of attention in recent years due to its importance in
several document-control based applications and its widespread commercial
value. The majority of the research work conducted on this topic to date follow
a two-step pipeline. First, they read the text using an off-the-shelf Optical
Character Recognition (OCR) engine, then, they extract the fields of interest
from the obtained text. The main drawback of these approaches is their
dependence on an external OCR system, which can negatively impact both
performance and computational speed. Recent OCR-free methods were proposed to
address the previous issues. Inspired by their promising results, we propose in
this paper an OCR-free end-to-end information extraction model named DocParser.
It differs from prior end-to-end approaches by its ability to better extract
discriminative character features. DocParser achieves state-of-the-art results
on various datasets, while still being faster than previous works.
Related papers
- Contextual Document Embeddings [77.22328616983417]
We propose two complementary methods for contextualized document embeddings.
First, an alternative contrastive learning objective that explicitly incorporates the document neighbors into the intra-batch contextual loss.
Second, a new contextual architecture that explicitly encodes neighbor document information into the encoded representation.
arXiv Detail & Related papers (2024-10-03T14:33:34Z) - mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document
Understanding [55.4806974284156]
Document understanding refers to automatically extract, analyze and comprehend information from digital documents, such as a web page.
Existing Multi-model Large Language Models (MLLMs) have demonstrated promising zero-shot capabilities in shallow OCR-free text recognition.
arXiv Detail & Related papers (2023-07-04T11:28:07Z) - Information Extraction from Documents: Question Answering vs Token
Classification in real-world setups [0.0]
We compare the Question Answering approach with the classical token classification approach for document key information extraction.
Our research showed that when dealing with clean and relatively short entities, it is still best to use token classification-based approach.
arXiv Detail & Related papers (2023-04-21T14:43:42Z) - Fine-Grained Distillation for Long Document Retrieval [86.39802110609062]
Long document retrieval aims to fetch query-relevant documents from a large-scale collection.
Knowledge distillation has become de facto to improve a retriever by mimicking a heterogeneous yet powerful cross-encoder.
We propose a new learning framework, fine-grained distillation (FGD), for long-document retrievers.
arXiv Detail & Related papers (2022-12-20T17:00:36Z) - Generate rather than Retrieve: Large Language Models are Strong Context
Generators [74.87021992611672]
We present a novel perspective for solving knowledge-intensive tasks by replacing document retrievers with large language model generators.
We call our method generate-then-read (GenRead), which first prompts a large language model to generate contextutal documents based on a given question, and then reads the generated documents to produce the final answer.
arXiv Detail & Related papers (2022-09-21T01:30:59Z) - Layout-Aware Information Extraction for Document-Grounded Dialogue:
Dataset, Method and Demonstration [75.47708732473586]
We propose a layout-aware document-level Information Extraction dataset, LIE, to facilitate the study of extracting both structural and semantic knowledge from visually rich documents.
LIE contains 62k annotations of three extraction tasks from 4,061 pages in product and official documents.
Empirical results show that layout is critical for VRD-based extraction, and system demonstration also verifies that the extracted knowledge can help locate the answers that users care about.
arXiv Detail & Related papers (2022-07-14T07:59:45Z) - OCR-IDL: OCR Annotations for Industry Document Library Dataset [8.905920197601171]
We make public the OCR annotations for IDL documents using commercial OCR engine.
The contributed dataset (OCR-IDL) has an estimated monetary value over 20K US$.
arXiv Detail & Related papers (2022-02-25T21:30:48Z) - Donut: Document Understanding Transformer without OCR [17.397447819420695]
We propose a novel VDU model that is end-to-end trainable without underpinning OCR framework.
Our approach achieves state-of-the-art performance on various document understanding tasks in public benchmark datasets and private industrial service datasets.
arXiv Detail & Related papers (2021-11-30T18:55:19Z) - One-shot Key Information Extraction from Document with Deep Partial
Graph Matching [60.48651298832829]
Key Information Extraction (KIE) from documents improves efficiency, productivity, and security in many industrial scenarios.
Existing supervised learning methods for the KIE task need to feed a large number of labeled samples and learn separate models for different types of documents.
We propose a deep end-to-end trainable network for one-shot KIE using partial graph matching.
arXiv Detail & Related papers (2021-09-26T07:45:53Z) - docExtractor: An off-the-shelf historical document element extraction [18.828438308738495]
We present docExtractor, a generic approach for extracting visual elements such as text lines or illustrations from historical documents.
We demonstrate it provides high-quality performances as an off-the-shelf system across a wide variety of datasets.
We introduce a new public dataset dubbed IlluHisDoc dedicated to the fine evaluation of illustration segmentation in historical documents.
arXiv Detail & Related papers (2020-12-15T10:19:18Z) - PICK: Processing Key Information Extraction from Documents using
Improved Graph Learning-Convolutional Networks [5.210482046387142]
Key Information Extraction from documents remains a challenge.
We introduce PICK, a framework that is effective and robust in handling complex documents layout for KIE.
Our method outperforms baselines methods by significant margins.
arXiv Detail & Related papers (2020-04-16T05:20:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.