Recognition-free Question Answering on Handwritten Document Collections
- URL: http://arxiv.org/abs/2202.06080v1
- Date: Sat, 12 Feb 2022 14:47:44 GMT
- Title: Recognition-free Question Answering on Handwritten Document Collections
- Authors: Oliver T\"uselmann, Friedrich M\"uller, Fabian Wolf and Gernot A. Fink
- Abstract summary: We present a recognition-free Question Answering approach for handwritten documents.
Our approaches outperform the state-of-the-art recognition-free models on the challenging BenthamQA and HW-SQuAD datasets.
- Score: 3.0969191504482247
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, considerable progress has been made in the research area of
Question Answering (QA) on document images. Current QA approaches from the
Document Image Analysis community are mainly focusing on machine-printed
documents and perform rather limited on handwriting. This is mainly due to the
reduced recognition performance on handwritten documents. To tackle this
problem, we propose a recognition-free QA approach, especially designed for
handwritten document image collections. We present a robust document retrieval
method, as well as two QA models. Our approaches outperform the
state-of-the-art recognition-free models on the challenging BenthamQA and
HW-SQuAD datasets.
Related papers
- DocXplain: A Novel Model-Agnostic Explainability Method for Document Image Classification [5.247930659596986]
This paper introduces DocXplain, a novel model-agnostic explainability method specifically designed for generating high interpretability feature attribution maps.
We extensively evaluate our proposed approach in the context of document image classification, utilizing 4 different evaluation metrics.
To the best of the authors' knowledge, this work presents the first model-agnostic attribution-based explainability method specifically tailored for document images.
arXiv Detail & Related papers (2024-07-04T10:59:15Z) - HiQA: A Hierarchical Contextual Augmentation RAG for Multi-Documents QA [13.000411428297813]
We present HiQA, an advanced multi-document question-answering (MDQA) framework that integrates cascading metadata into content and a multi-route retrieval mechanism.
We also release a benchmark called MasQA to evaluate and research in MDQA.
arXiv Detail & Related papers (2024-02-01T02:24:15Z) - DocAligner: Annotating Real-world Photographic Document Images by Simply
Taking Pictures [24.76258692552673]
We present DocAligner, a novel method that streamlines the manual annotation process to a simple step of taking pictures.
It achieves this by establishing dense correspondence between photographic document images and their clean counterparts.
Considering the distinctive characteristics of document images, DocAligner incorporates several innovative features.
arXiv Detail & Related papers (2023-06-09T08:29:15Z) - Peek Across: Improving Multi-Document Modeling via Cross-Document
Question-Answering [49.85790367128085]
We pre-training a generic multi-document model from a novel cross-document question answering pre-training objective.
This novel multi-document QA formulation directs the model to better recover cross-text informational relations.
Unlike prior multi-document models that focus on either classification or summarization tasks, our pre-training objective formulation enables the model to perform tasks that involve both short text generation and long text generation.
arXiv Detail & Related papers (2023-05-24T17:48:40Z) - Deep Unrestricted Document Image Rectification [110.61517455253308]
We present DocTr++, a novel unified framework for document image rectification.
We upgrade the original architecture by adopting a hierarchical encoder-decoder structure for multi-scale representation extraction and parsing.
We contribute a real-world test set and metrics applicable for evaluating the rectification quality.
arXiv Detail & Related papers (2023-04-18T08:00:54Z) - Open Set Classification of Untranscribed Handwritten Documents [56.0167902098419]
Huge amounts of digital page images of important manuscripts are preserved in archives worldwide.
The class or typology'' of a document is perhaps the most important tag to be included in the metadata.
The technical problem is one of automatic classification of documents, each consisting of a set of untranscribed handwritten text images.
arXiv Detail & Related papers (2022-06-20T20:43:50Z) - GERE: Generative Evidence Retrieval for Fact Verification [57.78768817972026]
We propose GERE, the first system that retrieves evidences in a generative fashion.
The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-12T03:49:35Z) - DocScanner: Robust Document Image Rectification with Progressive
Learning [162.03694280524084]
This work presents DocScanner, a new deep network architecture for document image rectification.
DocScanner maintains a single estimate of the rectified image, which is progressively corrected with a recurrent architecture.
The iterative refinements make DocScanner converge to a robust and superior performance, and the lightweight recurrent architecture ensures the running efficiency.
arXiv Detail & Related papers (2021-10-28T09:15:02Z) - Asking questions on handwritten document collections [35.85762649504866]
This work addresses the problem of Question Answering (QA) on handwritten document collections.
Unlike typical QA and Visual Question Answering (VQA) formulations where the answer is a short text, we aim to locate a document snippet where the answer lies.
We argue that the recognition-free approach is suitable for handwritten documents and historical collections where robust text recognition is often difficult.
arXiv Detail & Related papers (2021-10-02T02:40:40Z) - Enhance to Read Better: An Improved Generative Adversarial Network for
Handwritten Document Image Enhancement [1.7491858164568674]
We propose an end to end architecture based on Generative Adversarial Networks (GANs) to recover degraded documents into a clean and readable form.
To the best of our knowledge, this is the first work to use the text information while binarizing handwritten documents.
We outperform the state of the art in H-DIBCO 2018 challenge, after fine tuning our pre-trained model with synthetically degraded Latin handwritten images.
arXiv Detail & Related papers (2021-05-26T17:44:45Z) - Knowledge-Aided Open-Domain Question Answering [58.712857964048446]
We propose a knowledge-aided open-domain QA (KAQA) method which targets at improving relevant document retrieval and answer reranking.
During document retrieval, a candidate document is scored by considering its relationship to the question and other documents.
During answer reranking, a candidate answer is reranked using not only its own context but also the clues from other documents.
arXiv Detail & Related papers (2020-06-09T13:28:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.