Asking questions on handwritten document collections
- URL: http://arxiv.org/abs/2110.00711v1
- Date: Sat, 2 Oct 2021 02:40:40 GMT
- Title: Asking questions on handwritten document collections
- Authors: Minesh Mathew, Lluis Gomez, Dimosthenis Karatzas and CV Jawahar
- Abstract summary: This work addresses the problem of Question Answering (QA) on handwritten document collections.
Unlike typical QA and Visual Question Answering (VQA) formulations where the answer is a short text, we aim to locate a document snippet where the answer lies.
We argue that the recognition-free approach is suitable for handwritten documents and historical collections where robust text recognition is often difficult.
- Score: 35.85762649504866
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work addresses the problem of Question Answering (QA) on handwritten
document collections. Unlike typical QA and Visual Question Answering (VQA)
formulations where the answer is a short text, we aim to locate a document
snippet where the answer lies. The proposed approach works without recognizing
the text in the documents. We argue that the recognition-free approach is
suitable for handwritten documents and historical collections where robust text
recognition is often difficult. At the same time, for human users, document
image snippets containing answers act as a valid alternative to textual
answers. The proposed approach uses an off-the-shelf deep embedding network
which can project both textual words and word images into a common sub-space.
This embedding bridges the textual and visual domains and helps us retrieve
document snippets that potentially answer a question. We evaluate results of
the proposed approach on two new datasets: (i) HW-SQuAD: a synthetic,
handwritten document image counterpart of SQuAD1.0 dataset and (ii) BenthamQA:
a smaller set of QA pairs defined on documents from the popular Bentham
manuscripts collection. We also present a thorough analysis of the proposed
recognition-free approach compared to a recognition-based approach which uses
text recognized from the images using an OCR. Datasets presented in this work
are available to download at docvqa.org
Related papers
- Contextual Document Embeddings [77.22328616983417]
We propose two complementary methods for contextualized document embeddings.
First, an alternative contrastive learning objective that explicitly incorporates the document neighbors into the intra-batch contextual loss.
Second, a new contextual architecture that explicitly encodes neighbor document information into the encoded representation.
arXiv Detail & Related papers (2024-10-03T14:33:34Z) - Unifying Multimodal Retrieval via Document Screenshot Embedding [92.03571344075607]
Document Screenshot Embedding (DSE) is a novel retrieval paradigm that regards document screenshots as a unified input format.
We first craft the dataset of Wiki-SS, a 1.3M Wikipedia web page screenshots as the corpus to answer the questions from the Natural Questions dataset.
In such a text-intensive document retrieval setting, DSE shows competitive effectiveness compared to other text retrieval methods relying on parsing.
arXiv Detail & Related papers (2024-06-17T06:27:35Z) - Detect, Retrieve, Comprehend: A Flexible Framework for Zero-Shot
Document-Level Question Answering [6.224211330728391]
Researchers produce thousands of scholarly documents containing valuable technical knowledge.
Document-level question answering (QA) offers a flexible framework where human-posed questions can be adapted to extract diverse knowledge.
We present a three-stage document QA approach: text extraction from PDF; evidence retrieval from extracted texts to form well-posed contexts; and QA to extract knowledge from contexts to return high-quality answers.
arXiv Detail & Related papers (2022-10-04T23:33:52Z) - Generate rather than Retrieve: Large Language Models are Strong Context
Generators [74.87021992611672]
We present a novel perspective for solving knowledge-intensive tasks by replacing document retrievers with large language model generators.
We call our method generate-then-read (GenRead), which first prompts a large language model to generate contextutal documents based on a given question, and then reads the generated documents to produce the final answer.
arXiv Detail & Related papers (2022-09-21T01:30:59Z) - TAG: Boosting Text-VQA via Text-aware Visual Question-answer Generation [55.83319599681002]
Text-VQA aims at answering questions that require understanding the textual cues in an image.
We develop a new method to generate high-quality and diverse QA pairs by explicitly utilizing the existing rich text available in the scene context of each image.
arXiv Detail & Related papers (2022-08-03T02:18:09Z) - V-Doc : Visual questions answers with Documents [1.6785823565413143]
V-Doc is a question-answering tool using document images and PDF.
It supports generating and using both extractive and abstractive question-answer pairs.
arXiv Detail & Related papers (2022-05-27T02:38:09Z) - Recognition-free Question Answering on Handwritten Document Collections [3.0969191504482247]
We present a recognition-free Question Answering approach for handwritten documents.
Our approaches outperform the state-of-the-art recognition-free models on the challenging BenthamQA and HW-SQuAD datasets.
arXiv Detail & Related papers (2022-02-12T14:47:44Z) - Combining Deep Learning and Reasoning for Address Detection in
Unstructured Text Documents [0.0]
We propose a hybrid approach that combines deep learning with reasoning for finding and extracting addresses from unstructured text documents.
We use a visual deep learning model to detect the boundaries of possible address regions on the scanned document images.
arXiv Detail & Related papers (2022-02-07T12:32:00Z) - Knowledge-Aided Open-Domain Question Answering [58.712857964048446]
We propose a knowledge-aided open-domain QA (KAQA) method which targets at improving relevant document retrieval and answer reranking.
During document retrieval, a candidate document is scored by considering its relationship to the question and other documents.
During answer reranking, a candidate answer is reranked using not only its own context but also the clues from other documents.
arXiv Detail & Related papers (2020-06-09T13:28:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.