V-Doc : Visual questions answers with Documents
- URL: http://arxiv.org/abs/2205.13724v2
- Date: Tue, 31 May 2022 03:33:33 GMT
- Title: V-Doc : Visual questions answers with Documents
- Authors: Yihao Ding, Zhe Huang, Runlin Wang, Yanhang Zhang, Xianru Chen,
Yuzhong Ma, Hyunsuk Chung and Soyeon Caren Han
- Abstract summary: V-Doc is a question-answering tool using document images and PDF.
It supports generating and using both extractive and abstractive question-answer pairs.
- Score: 1.6785823565413143
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose V-Doc, a question-answering tool using document images and PDF,
mainly for researchers and general non-deep learning experts looking to
generate, process, and understand the document visual question answering tasks.
The V-Doc supports generating and using both extractive and abstractive
question-answer pairs using documents images. The extractive QA selects a
subset of tokens or phrases from the document contents to predict the answers,
while the abstractive QA recognises the language in the content and generates
the answer based on the trained model. Both aspects are crucial to
understanding the documents, especially in an image format. We include a
detailed scenario of question generation for the abstractive QA task. V-Doc
supports a wide range of datasets and models, and is highly extensible through
a declarative, framework-agnostic platform.
Related papers
- M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding [63.33447665725129]
We introduce M3DocRAG, a novel multi-modal RAG framework that flexibly accommodates various document contexts.
M3DocRAG can efficiently handle single or many documents while preserving visual information.
We also present M3DocVQA, a new benchmark for evaluating open-domain DocVQA over 3,000+ PDF documents with 40,000+ pages.
arXiv Detail & Related papers (2024-11-07T18:29:38Z) - JDocQA: Japanese Document Question Answering Dataset for Generative Language Models [15.950718839723027]
We introduce Japanese Document Question Answering (JDocQA), a large-scale document-based QA dataset.
It comprises 5,504 documents in PDF format and annotated 11,600 question-and-answer instances in Japanese.
We incorporate multiple categories of questions and unanswerable questions from the document for realistic question-answering applications.
arXiv Detail & Related papers (2024-03-28T14:22:54Z) - PDFTriage: Question Answering over Long, Structured Documents [60.96667912964659]
Representing structured documents as plain text is incongruous with the user's mental model of these documents with rich structure.
We propose PDFTriage that enables models to retrieve the context based on either structure or content.
Our benchmark dataset consists of 900+ human-generated questions over 80 structured documents.
arXiv Detail & Related papers (2023-09-16T04:29:05Z) - Workshop on Document Intelligence Understanding [3.2929609168290543]
This workshop aims to bring together researchers and industry developers in the field of document intelligence.
We also released a data challenge on the recently introduced document-level VQA dataset, PDFVQA.
arXiv Detail & Related papers (2023-07-31T02:14:25Z) - PDFVQA: A New Dataset for Real-World VQA on PDF Documents [2.105395241374678]
Document-based Visual Question Answering examines the document understanding of document images in conditions of natural language questions.
Our PDF-VQA dataset extends the current scale of document understanding that limits on the single document page to the new scale that asks questions over the full document of multiple pages.
arXiv Detail & Related papers (2023-04-13T12:28:14Z) - Detect, Retrieve, Comprehend: A Flexible Framework for Zero-Shot
Document-Level Question Answering [6.224211330728391]
Researchers produce thousands of scholarly documents containing valuable technical knowledge.
Document-level question answering (QA) offers a flexible framework where human-posed questions can be adapted to extract diverse knowledge.
We present a three-stage document QA approach: text extraction from PDF; evidence retrieval from extracted texts to form well-posed contexts; and QA to extract knowledge from contexts to return high-quality answers.
arXiv Detail & Related papers (2022-10-04T23:33:52Z) - Generate rather than Retrieve: Large Language Models are Strong Context
Generators [74.87021992611672]
We present a novel perspective for solving knowledge-intensive tasks by replacing document retrievers with large language model generators.
We call our method generate-then-read (GenRead), which first prompts a large language model to generate contextutal documents based on a given question, and then reads the generated documents to produce the final answer.
arXiv Detail & Related papers (2022-09-21T01:30:59Z) - Towards Complex Document Understanding By Discrete Reasoning [77.91722463958743]
Document Visual Question Answering (VQA) aims to understand visually-rich documents to answer questions in natural language.
We introduce a new Document VQA dataset, named TAT-DQA, which consists of 3,067 document pages and 16,558 question-answer pairs.
We develop a novel model named MHST that takes into account the information in multi-modalities, including text, layout and visual image, to intelligently address different types of questions.
arXiv Detail & Related papers (2022-07-25T01:43:19Z) - Layout-Aware Information Extraction for Document-Grounded Dialogue:
Dataset, Method and Demonstration [75.47708732473586]
We propose a layout-aware document-level Information Extraction dataset, LIE, to facilitate the study of extracting both structural and semantic knowledge from visually rich documents.
LIE contains 62k annotations of three extraction tasks from 4,061 pages in product and official documents.
Empirical results show that layout is critical for VRD-based extraction, and system demonstration also verifies that the extracted knowledge can help locate the answers that users care about.
arXiv Detail & Related papers (2022-07-14T07:59:45Z) - Unified Pretraining Framework for Document Understanding [52.224359498792836]
We present UDoc, a new unified pretraining framework for document understanding.
UDoc is designed to support most document understanding tasks, extending the Transformer to take multimodal embeddings as input.
An important feature of UDoc is that it learns a generic representation by making use of three self-supervised losses.
arXiv Detail & Related papers (2022-04-22T21:47:04Z) - Asking questions on handwritten document collections [35.85762649504866]
This work addresses the problem of Question Answering (QA) on handwritten document collections.
Unlike typical QA and Visual Question Answering (VQA) formulations where the answer is a short text, we aim to locate a document snippet where the answer lies.
We argue that the recognition-free approach is suitable for handwritten documents and historical collections where robust text recognition is often difficult.
arXiv Detail & Related papers (2021-10-02T02:40:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.