Workshop on Document Intelligence Understanding
- URL: http://arxiv.org/abs/2307.16369v1
- Date: Mon, 31 Jul 2023 02:14:25 GMT
- Title: Workshop on Document Intelligence Understanding
- Authors: Soyeon Caren Han, Yihao Ding, Siwen Luo, Josiah Poon, HeeGuen Yoon,
Zhe Huang, Paul Duuring, Eun Jung Holden
- Abstract summary: This workshop aims to bring together researchers and industry developers in the field of document intelligence.
We also released a data challenge on the recently introduced document-level VQA dataset, PDFVQA.
- Score: 3.2929609168290543
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Document understanding and information extraction include different tasks to
understand a document and extract valuable information automatically. Recently,
there has been a rising demand for developing document understanding among
different domains, including business, law, and medicine, to boost the
efficiency of work that is associated with a large number of documents. This
workshop aims to bring together researchers and industry developers in the
field of document intelligence and understanding diverse document types to
boost automatic document processing and understanding techniques. We also
released a data challenge on the recently introduced document-level VQA
dataset, PDFVQA. The PDFVQA challenge examines the structural and contextual
understandings of proposed models on the natural full document level of
multiple consecutive document pages by including questions with a sequence of
answers extracted from multi-pages of the full document. This task helps to
boost the document understanding step from the single-page level to the full
document level understanding.
Related papers
- DocKD: Knowledge Distillation from LLMs for Open-World Document Understanding Models [66.91204604417912]
This study aims to enhance generalizability of small VDU models by distilling knowledge from LLMs.
We present a new framework (called DocKD) that enriches the data generation process by integrating external document knowledge.
Experiments show that DocKD produces high-quality document annotations and surpasses the direct knowledge distillation approach.
arXiv Detail & Related papers (2024-10-04T00:53:32Z) - Unified Multi-Modal Interleaved Document Representation for Information Retrieval [57.65409208879344]
We produce more comprehensive and nuanced document representations by holistically embedding documents interleaved with different modalities.
Specifically, we achieve this by leveraging the capability of recent vision-language models that enable the processing and integration of text, images, and tables into a unified format and representation.
arXiv Detail & Related papers (2024-10-03T17:49:09Z) - PDF-MVQA: A Dataset for Multimodal Information Retrieval in PDF-based Visual Question Answering [13.625303311724757]
Document Question Answering (QA) presents a challenge in understanding visually-rich documents (VRD)
We propose PDF-MVQA, which is tailored for research journal articles, encompassing multiple pages and multimodal information retrieval.
arXiv Detail & Related papers (2024-04-19T09:00:05Z) - FATURA: A Multi-Layout Invoice Image Dataset for Document Analysis and
Understanding [8.855033708082832]
We introduce FATURA, a pivotal resource for researchers in the field of document analysis and understanding.
FATURA is a highly diverse dataset featuring multi- annotated invoice document images.
We provide comprehensive benchmarks for various document analysis and understanding tasks and conduct experiments under diverse training and evaluation scenarios.
arXiv Detail & Related papers (2023-11-20T15:51:14Z) - PDFTriage: Question Answering over Long, Structured Documents [60.96667912964659]
Representing structured documents as plain text is incongruous with the user's mental model of these documents with rich structure.
We propose PDFTriage that enables models to retrieve the context based on either structure or content.
Our benchmark dataset consists of 900+ human-generated questions over 80 structured documents.
arXiv Detail & Related papers (2023-09-16T04:29:05Z) - DocumentNet: Bridging the Data Gap in Document Pre-Training [78.01647768018485]
We propose a method to collect massive-scale and weakly labeled data from the web to benefit the training of VDER models.
The collected dataset, named DocumentNet, does not depend on specific document types or entity sets.
Experiments on a set of broadly adopted VDER tasks show significant improvements when DocumentNet is incorporated into the pre-training.
arXiv Detail & Related papers (2023-06-15T08:21:15Z) - DLUE: Benchmarking Document Language Understanding [32.550855843975484]
There is no well-established consensus on how to comprehensively evaluate document understanding abilities.
This paper summarizes four representative abilities, i.e., document classification, document structural analysis, document information extraction, and document transcription.
Under the new evaluation framework, we propose textbfDocument Language Understanding Evaluation -- textbfDLUE, a new task suite.
arXiv Detail & Related papers (2023-05-16T15:16:24Z) - PDFVQA: A New Dataset for Real-World VQA on PDF Documents [2.105395241374678]
Document-based Visual Question Answering examines the document understanding of document images in conditions of natural language questions.
Our PDF-VQA dataset extends the current scale of document understanding that limits on the single document page to the new scale that asks questions over the full document of multiple pages.
arXiv Detail & Related papers (2023-04-13T12:28:14Z) - Layout-Aware Information Extraction for Document-Grounded Dialogue:
Dataset, Method and Demonstration [75.47708732473586]
We propose a layout-aware document-level Information Extraction dataset, LIE, to facilitate the study of extracting both structural and semantic knowledge from visually rich documents.
LIE contains 62k annotations of three extraction tasks from 4,061 pages in product and official documents.
Empirical results show that layout is critical for VRD-based extraction, and system demonstration also verifies that the extracted knowledge can help locate the answers that users care about.
arXiv Detail & Related papers (2022-07-14T07:59:45Z) - A Survey of Deep Learning Approaches for OCR and Document Understanding [68.65995739708525]
We review different techniques for document understanding for documents written in English.
We consolidate methodologies present in literature to act as a jumping-off point for researchers exploring this area.
arXiv Detail & Related papers (2020-11-27T03:05:59Z) - Towards a Multi-modal, Multi-task Learning based Pre-training Framework
for Document Representation Learning [5.109216329453963]
We introduce Document Topic Modelling and Document Shuffle Prediction as novel pre-training tasks.
We utilize the Longformer network architecture as the backbone to encode the multi-modal information from multi-page documents in an end-to-end fashion.
arXiv Detail & Related papers (2020-09-30T05:39:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.