DLUE: Benchmarking Document Language Understanding
- URL: http://arxiv.org/abs/2305.09520v1
- Date: Tue, 16 May 2023 15:16:24 GMT
- Title: DLUE: Benchmarking Document Language Understanding
- Authors: Ruoxi Xu, Hongyu Lin, Xinyan Guan, Xianpei Han, Yingfei Sun, Le Sun
- Abstract summary: There is no well-established consensus on how to comprehensively evaluate document understanding abilities.
This paper summarizes four representative abilities, i.e., document classification, document structural analysis, document information extraction, and document transcription.
Under the new evaluation framework, we propose textbfDocument Language Understanding Evaluation -- textbfDLUE, a new task suite.
- Score: 32.550855843975484
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Understanding documents is central to many real-world tasks but remains a
challenging topic. Unfortunately, there is no well-established consensus on how
to comprehensively evaluate document understanding abilities, which
significantly hinders the fair comparison and measuring the progress of the
field. To benchmark document understanding researches, this paper summarizes
four representative abilities, i.e., document classification, document
structural analysis, document information extraction, and document
transcription. Under the new evaluation framework, we propose \textbf{Document
Language Understanding Evaluation} -- \textbf{DLUE}, a new task suite which
covers a wide-range of tasks in various forms, domains and document genres. We
also systematically evaluate six well-established transformer models on DLUE,
and find that due to the lengthy content, complicated underlying structure and
dispersed knowledge, document understanding is still far from being solved, and
currently there is no neural architecture that dominates all tasks, raising
requirements for a universal document understanding architecture.
Related papers
- Unified Multi-Modal Interleaved Document Representation for Information Retrieval [57.65409208879344]
We produce more comprehensive and nuanced document representations by holistically embedding documents interleaved with different modalities.
Specifically, we achieve this by leveraging the capability of recent vision-language models that enable the processing and integration of text, images, and tables into a unified format and representation.
arXiv Detail & Related papers (2024-10-03T17:49:09Z) - DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models [63.466265039007816]
We present DocGenome, a structured document benchmark constructed by annotating 500K scientific documents from 153 disciplines in the arXiv open-access community.
We conduct extensive experiments to demonstrate the advantages of DocGenome and objectively evaluate the performance of large models on our benchmark.
arXiv Detail & Related papers (2024-06-17T15:13:52Z) - PDFTriage: Question Answering over Long, Structured Documents [60.96667912964659]
Representing structured documents as plain text is incongruous with the user's mental model of these documents with rich structure.
We propose PDFTriage that enables models to retrieve the context based on either structure or content.
Our benchmark dataset consists of 900+ human-generated questions over 80 structured documents.
arXiv Detail & Related papers (2023-09-16T04:29:05Z) - Workshop on Document Intelligence Understanding [3.2929609168290543]
This workshop aims to bring together researchers and industry developers in the field of document intelligence.
We also released a data challenge on the recently introduced document-level VQA dataset, PDFVQA.
arXiv Detail & Related papers (2023-07-31T02:14:25Z) - A Study of Documentation for Software Architecture [7.011803832284996]
We asked 65 participants to answer software architecture understanding questions.
Answers to questions that require applying and creating activities were statistically significantly associated with the use of the system's source code.
We conclude that, in the limited experimental context studied, our results contradict the hypothesis that the format of architectural documentation matters.
arXiv Detail & Related papers (2023-05-26T22:14:53Z) - PDFVQA: A New Dataset for Real-World VQA on PDF Documents [2.105395241374678]
Document-based Visual Question Answering examines the document understanding of document images in conditions of natural language questions.
Our PDF-VQA dataset extends the current scale of document understanding that limits on the single document page to the new scale that asks questions over the full document of multiple pages.
arXiv Detail & Related papers (2023-04-13T12:28:14Z) - Layout-Aware Information Extraction for Document-Grounded Dialogue:
Dataset, Method and Demonstration [75.47708732473586]
We propose a layout-aware document-level Information Extraction dataset, LIE, to facilitate the study of extracting both structural and semantic knowledge from visually rich documents.
LIE contains 62k annotations of three extraction tasks from 4,061 pages in product and official documents.
Empirical results show that layout is critical for VRD-based extraction, and system demonstration also verifies that the extracted knowledge can help locate the answers that users care about.
arXiv Detail & Related papers (2022-07-14T07:59:45Z) - Unified Pretraining Framework for Document Understanding [52.224359498792836]
We present UDoc, a new unified pretraining framework for document understanding.
UDoc is designed to support most document understanding tasks, extending the Transformer to take multimodal embeddings as input.
An important feature of UDoc is that it learns a generic representation by making use of three self-supervised losses.
arXiv Detail & Related papers (2022-04-22T21:47:04Z) - The Law of Large Documents: Understanding the Structure of Legal
Contracts Using Visual Cues [0.7425558351422133]
We measure the impact of incorporating visual cues, obtained via computer vision methods, on the accuracy of document understanding tasks.
Our method of segmenting documents based on structural metadata out-performs existing methods on four long-document understanding tasks.
arXiv Detail & Related papers (2021-07-16T21:21:50Z) - Timestamping Documents and Beliefs [1.4467794332678539]
Document dating is a challenging problem which requires inference over the temporal structure of the document.
In this paper we propose NeuralDater, a Graph Convolutional Network (GCN) based document dating approach.
We also propose AD3: Attentive Deep Document Dater, an attention-based document dating system.
arXiv Detail & Related papers (2021-06-09T02:12:18Z) - Document-level Neural Machine Translation with Document Embeddings [82.4684444847092]
This work focuses on exploiting detailed document-level context in terms of multiple forms of document embeddings.
The proposed document-aware NMT is implemented to enhance the Transformer baseline by introducing both global and local document-level clues on the source end.
arXiv Detail & Related papers (2020-09-16T19:43:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.