A Survey of Deep Learning Approaches for OCR and Document Understanding
- URL: http://arxiv.org/abs/2011.13534v2
- Date: Thu, 4 Feb 2021 23:48:39 GMT
- Title: A Survey of Deep Learning Approaches for OCR and Document Understanding
- Authors: Nishant Subramani and Alexandre Matton and Malcolm Greaves and Adrian
Lam
- Abstract summary: We review different techniques for document understanding for documents written in English.
We consolidate methodologies present in literature to act as a jumping-off point for researchers exploring this area.
- Score: 68.65995739708525
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Documents are a core part of many businesses in many fields such as law,
finance, and technology among others. Automatic understanding of documents such
as invoices, contracts, and resumes is lucrative, opening up many new avenues
of business. The fields of natural language processing and computer vision have
seen tremendous progress through the development of deep learning such that
these methods have started to become infused in contemporary document
understanding systems. In this survey paper, we review different techniques for
document understanding for documents written in English and consolidate
methodologies present in literature to act as a jumping-off point for
researchers exploring this area.
Related papers
- Unified Multi-Modal Interleaved Document Representation for Information Retrieval [57.65409208879344]
We produce more comprehensive and nuanced document representations by holistically embedding documents interleaved with different modalities.
Specifically, we achieve this by leveraging the capability of recent vision-language models that enable the processing and integration of text, images, and tables into a unified format and representation.
arXiv Detail & Related papers (2024-10-03T17:49:09Z) - Leveraging Distillation Techniques for Document Understanding: A Case Study with FLAN-T5 [0.0]
We present a novel approach wherein we distill document understanding knowledge from the proprietary LLM ChatGPT into FLAN-T5.
Our findings underscore the potential of distillation techniques in facilitating the deployment of sophisticated language models in real-world scenarios.
arXiv Detail & Related papers (2024-09-17T15:37:56Z) - Deep Learning based Key Information Extraction from Business Documents: Systematic Literature Review [51.61531917413708]
Deep learning-based approaches for Key Information Extraction have been proposed under the umbrella term Document Understanding.
The goal of this systematic literature review is an in-depth analysis of existing approaches in this domain and the identification of opportunities for further research.
arXiv Detail & Related papers (2024-07-23T08:15:55Z) - Understanding Archives: Towards New Research Interfaces Relying on the Semantic Annotation of Documents [0.2302001830524133]
We show how the semantic annotation of the textual content of study corpora of archival documents allow to facilitate their exploitation and valorisation.
First, we present a methodological framework for the construction of new interfaces based on textual semantics, then address the current technological obstacles and their potential solutions.
arXiv Detail & Related papers (2024-03-28T07:55:29Z) - Workshop on Document Intelligence Understanding [3.2929609168290543]
This workshop aims to bring together researchers and industry developers in the field of document intelligence.
We also released a data challenge on the recently introduced document-level VQA dataset, PDFVQA.
arXiv Detail & Related papers (2023-07-31T02:14:25Z) - DLUE: Benchmarking Document Language Understanding [32.550855843975484]
There is no well-established consensus on how to comprehensively evaluate document understanding abilities.
This paper summarizes four representative abilities, i.e., document classification, document structural analysis, document information extraction, and document transcription.
Under the new evaluation framework, we propose textbfDocument Language Understanding Evaluation -- textbfDLUE, a new task suite.
arXiv Detail & Related papers (2023-05-16T15:16:24Z) - Embedding Knowledge for Document Summarization: A Survey [66.76415502727802]
Previous works proved that knowledge-embedded document summarizers excel at generating superior digests.
We propose novel to recapitulate knowledge and knowledge embeddings under the document summarization view.
arXiv Detail & Related papers (2022-04-24T04:36:07Z) - Unified Pretraining Framework for Document Understanding [52.224359498792836]
We present UDoc, a new unified pretraining framework for document understanding.
UDoc is designed to support most document understanding tasks, extending the Transformer to take multimodal embeddings as input.
An important feature of UDoc is that it learns a generic representation by making use of three self-supervised losses.
arXiv Detail & Related papers (2022-04-22T21:47:04Z) - Document AI: Benchmarks, Models and Applications [35.46858492311289]
Document AI refers to the techniques for automatically reading, understanding, and analyzing business documents.
In recent years, the popularity of deep learning technology has greatly advanced the development of Document AI.
This paper briefly reviews some of the representative models, tasks, and benchmark datasets.
arXiv Detail & Related papers (2021-11-16T16:43:07Z) - Explaining Relationships Between Scientific Documents [55.23390424044378]
We address the task of explaining relationships between two scientific documents using natural language text.
In this paper we establish a dataset of 622K examples from 154K documents.
arXiv Detail & Related papers (2020-02-02T03:54:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.