Related papers: Asking questions on handwritten document collections

Asking questions on handwritten document collections

URL: http://arxiv.org/abs/2110.00711v1
Date: Sat, 2 Oct 2021 02:40:40 GMT
Title: Asking questions on handwritten document collections
Authors: Minesh Mathew, Lluis Gomez, Dimosthenis Karatzas and CV Jawahar
Abstract summary: This work addresses the problem of Question Answering (QA) on handwritten document collections. Unlike typical QA and Visual Question Answering (VQA) formulations where the answer is a short text, we aim to locate a document snippet where the answer lies. We argue that the recognition-free approach is suitable for handwritten documents and historical collections where robust text recognition is often difficult.
Score: 35.85762649504866
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This work addresses the problem of Question Answering (QA) on handwritten document collections. Unlike typical QA and Visual Question Answering (VQA) formulations where the answer is a short text, we aim to locate a document snippet where the answer lies. The proposed approach works without recognizing the text in the documents. We argue that the recognition-free approach is suitable for handwritten documents and historical collections where robust text recognition is often difficult. At the same time, for human users, document image snippets containing answers act as a valid alternative to textual answers. The proposed approach uses an off-the-shelf deep embedding network which can project both textual words and word images into a common sub-space. This embedding bridges the textual and visual domains and helps us retrieve document snippets that potentially answer a question. We evaluate results of the proposed approach on two new datasets: (i) HW-SQuAD: a synthetic, handwritten document image counterpart of SQuAD1.0 dataset and (ii) BenthamQA: a smaller set of QA pairs defined on documents from the popular Bentham manuscripts collection. We also present a thorough analysis of the proposed recognition-free approach compared to a recognition-based approach which uses text recognized from the images using an OCR. Datasets presented in this work are available to download at docvqa.org

Related papers

Exploring text-to-image generation for historical document image retrieval [52.024964564408]
Attribute-based document image retrieval (ABDIR) was recently proposed as an alternative to query-by-example (QBE) searches.<n>We present an exploratory study of the use of generative AI to bridge the gap between QBE and ABDIR.<n>We propose T2I-QBE, which uses Leonardo.Ai as the T2I generator with prompts that include a rough description of the desired document type and a list of the desired ABDIR-style attributes.
arXiv Detail & Related papers (2025-07-28T15:43:46Z)
BoundingDocs: a Unified Dataset for Document Question Answering with Spatial Annotations [2.9798896492745537]
We present a unified dataset for document Question-Answering (QA) We reformulate existing Document AI tasks, such as Information Extraction (IE), into a Question-Answering task. On the other hand, we release the OCR of all the documents and include the exact position of the answer to be found in the document image as a bounding box.
arXiv Detail & Related papers (2025-01-06T21:46:22Z)
Contextual Document Embeddings [77.22328616983417]
We propose two complementary methods for contextualized document embeddings. First, an alternative contrastive learning objective that explicitly incorporates the document neighbors into the intra-batch contextual loss. Second, a new contextual architecture that explicitly encodes neighbor document information into the encoded representation.
arXiv Detail & Related papers (2024-10-03T14:33:34Z)
Unifying Multimodal Retrieval via Document Screenshot Embedding [92.03571344075607]
Document Screenshot Embedding (DSE) is a novel retrieval paradigm that regards document screenshots as a unified input format. We first craft the dataset of Wiki-SS, a 1.3M Wikipedia web page screenshots as the corpus to answer the questions from the Natural Questions dataset. In such a text-intensive document retrieval setting, DSE shows competitive effectiveness compared to other text retrieval methods relying on parsing.
arXiv Detail & Related papers (2024-06-17T06:27:35Z)
Detect, Retrieve, Comprehend: A Flexible Framework for Zero-Shot Document-Level Question Answering [6.224211330728391]
Researchers produce thousands of scholarly documents containing valuable technical knowledge. Document-level question answering (QA) offers a flexible framework where human-posed questions can be adapted to extract diverse knowledge. We present a three-stage document QA approach: text extraction from PDF; evidence retrieval from extracted texts to form well-posed contexts; and QA to extract knowledge from contexts to return high-quality answers.
arXiv Detail & Related papers (2022-10-04T23:33:52Z)
Generate rather than Retrieve: Large Language Models are Strong Context Generators [74.87021992611672]
We present a novel perspective for solving knowledge-intensive tasks by replacing document retrievers with large language model generators. We call our method generate-then-read (GenRead), which first prompts a large language model to generate contextutal documents based on a given question, and then reads the generated documents to produce the final answer.
arXiv Detail & Related papers (2022-09-21T01:30:59Z)
TAG: Boosting Text-VQA via Text-aware Visual Question-answer Generation [55.83319599681002]
Text-VQA aims at answering questions that require understanding the textual cues in an image. We develop a new method to generate high-quality and diverse QA pairs by explicitly utilizing the existing rich text available in the scene context of each image.
arXiv Detail & Related papers (2022-08-03T02:18:09Z)
V-Doc : Visual questions answers with Documents [1.6785823565413143]
V-Doc is a question-answering tool using document images and PDF. It supports generating and using both extractive and abstractive question-answer pairs.
arXiv Detail & Related papers (2022-05-27T02:38:09Z)
Recognition-free Question Answering on Handwritten Document Collections [3.0969191504482247]
We present a recognition-free Question Answering approach for handwritten documents. Our approaches outperform the state-of-the-art recognition-free models on the challenging BenthamQA and HW-SQuAD datasets.
arXiv Detail & Related papers (2022-02-12T14:47:44Z)
Combining Deep Learning and Reasoning for Address Detection in Unstructured Text Documents [0.0]
We propose a hybrid approach that combines deep learning with reasoning for finding and extracting addresses from unstructured text documents. We use a visual deep learning model to detect the boundaries of possible address regions on the scanned document images.
arXiv Detail & Related papers (2022-02-07T12:32:00Z)
Knowledge-Aided Open-Domain Question Answering [58.712857964048446]
We propose a knowledge-aided open-domain QA (KAQA) method which targets at improving relevant document retrieval and answer reranking. During document retrieval, a candidate document is scored by considering its relationship to the question and other documents. During answer reranking, a candidate answer is reranked using not only its own context but also the clues from other documents.
arXiv Detail & Related papers (2020-06-09T13:28:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.