Advancing Question Answering on Handwritten Documents: A State-of-the-Art Recognition-Based Model for HW-SQuAD
- URL: http://arxiv.org/abs/2406.17437v2
- Date: Mon, 15 Jul 2024 14:41:32 GMT
- Title: Advancing Question Answering on Handwritten Documents: A State-of-the-Art Recognition-Based Model for HW-SQuAD
- Authors: Aniket Pal, Ajoy Mondal, C. V. Jawahar,
- Abstract summary: This paper proposes a novel recognition-based approach that improves upon the previous state-of-the-art on the HW-SQuAD and BenthamQA datasets.
Our model incorporates transformer-based document retrieval and ensemble methods at the model level, achieving an Exact Match score of 82.02% and 69% in HW-SQuAD and BenthamQA datasets.
- Score: 30.559280110711143
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Question-answering handwritten documents is a challenging task with numerous real-world applications. This paper proposes a novel recognition-based approach that improves upon the previous state-of-the-art on the HW-SQuAD and BenthamQA datasets. Our model incorporates transformer-based document retrieval and ensemble methods at the model level, achieving an Exact Match score of 82.02% and 69% in HW-SQuAD and BenthamQA datasets, respectively, surpassing the previous best recognition-based approach by 10.89% and 3%. We also enhance the document retrieval component, boosting the top-5 retrieval accuracy from 90% to 95.30%. Our results demonstrate the significance of our proposed approach in advancing question answering on handwritten documents. The code and trained models will be publicly available to facilitate future research in this critical area of natural language.
Related papers
- OIDA-QA: A Multimodal Benchmark for Analyzing the Opioid Industry Documents Archive [50.468138755368805]
Opioid crisis represents a significant moment in public health.<n>Data and documents disclosed in the UCSF-JHU Opioid Industry Documents Archive (OIDA)<n>In this paper, we tackle this challenge by organizing the original dataset according to document attributes.
arXiv Detail & Related papers (2025-11-13T03:27:32Z) - DocReward: A Document Reward Model for Structuring and Stylizing [107.03974018371058]
DocReward is a document reward model that evaluates documents based on their structure and style.<n>It is trained using the Bradley-Terry loss to score documents, penalizing predictions that contradict the annotated ranking.<n>It achieves a significantly higher win rate of 60.8%, compared to GPT-5's 37.7% win rate.
arXiv Detail & Related papers (2025-10-13T13:36:32Z) - ASRank: Zero-Shot Re-Ranking with Answer Scent for Document Retrieval [16.434748534272014]
ASRank is a new re-ranking method based on scoring retrieved documents using zero-shot answer scent.
It increases Top-1 retrieval accuracy on NQ from $19.2%$ to $46.5%$ for MSS and $22.1%$ to $47.3%$ for BM25.
arXiv Detail & Related papers (2025-01-25T15:27:40Z) - ECLIPSE: Contrastive Dimension Importance Estimation with Pseudo-Irrelevance Feedback for Dense Retrieval [14.72046677914345]
Recent advances in Information Retrieval have leveraged high-dimensional embedding spaces to improve the retrieval of relevant documents.
Despite these high-dimensional representations, documents relevant to a query reside on a lower-dimensional, query-dependent manifold.
We propose a novel methodology that addresses these limitations by leveraging information from both relevant and non-relevant documents.
arXiv Detail & Related papers (2024-12-19T15:45:06Z) - Quam: Adaptive Retrieval through Query Affinity Modelling [15.3583908068962]
Building relevance models to rank documents based on user information needs is a central task in information retrieval and the NLP community.
We propose a unifying view of the nascent area of adaptive retrieval by proposing, Quam.
Our proposed approach, Quam improves the recall performance by up to 26% over the standard re-ranking baselines.
arXiv Detail & Related papers (2024-10-26T22:52:12Z) - Improving Embedding Accuracy for Document Retrieval Using Entity Relationship Maps and Model-Aware Contrastive Sampling [0.0]
APEX-Embedding-7B is a 7-billion parameter decoder-only text Feature Extraction Model.
Our approach employs two training techniques that yield an emergent improvement in factual focus.
Based on our evaluations, our model establishes a new state-of-the-art standard in text feature extraction for longer context document retrieval tasks.
arXiv Detail & Related papers (2024-10-08T17:36:48Z) - Improving Attributed Text Generation of Large Language Models via Preference Learning [28.09715554543885]
We model the attribution task as preference learning and introduce an Automatic Preference Optimization framework.
APO achieves state-of-the-art citation F1 with higher answer quality.
arXiv Detail & Related papers (2024-03-27T09:19:13Z) - Peek Across: Improving Multi-Document Modeling via Cross-Document
Question-Answering [49.85790367128085]
We pre-training a generic multi-document model from a novel cross-document question answering pre-training objective.
This novel multi-document QA formulation directs the model to better recover cross-text informational relations.
Unlike prior multi-document models that focus on either classification or summarization tasks, our pre-training objective formulation enables the model to perform tasks that involve both short text generation and long text generation.
arXiv Detail & Related papers (2023-05-24T17:48:40Z) - Incorporating Relevance Feedback for Information-Seeking Retrieval using
Few-Shot Document Re-Ranking [56.80065604034095]
We introduce a kNN approach that re-ranks documents based on their similarity with the query and the documents the user considers relevant.
To evaluate our different integration strategies, we transform four existing information retrieval datasets into the relevance feedback scenario.
arXiv Detail & Related papers (2022-10-19T16:19:37Z) - Generate rather than Retrieve: Large Language Models are Strong Context
Generators [74.87021992611672]
We present a novel perspective for solving knowledge-intensive tasks by replacing document retrievers with large language model generators.
We call our method generate-then-read (GenRead), which first prompts a large language model to generate contextutal documents based on a given question, and then reads the generated documents to produce the final answer.
arXiv Detail & Related papers (2022-09-21T01:30:59Z) - Bi-VLDoc: Bidirectional Vision-Language Modeling for Visually-Rich
Document Understanding [72.95838931445498]
Multi-modal document pre-trained models have proven to be very effective in a variety of visually-rich document understanding (VrDU) tasks.
The way they model and exploit the interactions between vision and language on documents has hindered them from better generalization ability and higher accuracy.
In this work, we investigate the problem of vision-language joint representation learning for VrDU mainly from the perspective of supervisory signals.
arXiv Detail & Related papers (2022-06-27T09:58:34Z) - Augmenting Document Representations for Dense Retrieval with
Interpolation and Perturbation [49.940525611640346]
Document Augmentation for dense Retrieval (DAR) framework augments the representations of documents with their Dense Augmentation and perturbations.
We validate the performance of DAR on retrieval tasks with two benchmark datasets, showing that the proposed DAR significantly outperforms relevant baselines on the dense retrieval of both the labeled and unlabeled documents.
arXiv Detail & Related papers (2022-03-15T09:07:38Z) - Recognition-free Question Answering on Handwritten Document Collections [3.0969191504482247]
We present a recognition-free Question Answering approach for handwritten documents.
Our approaches outperform the state-of-the-art recognition-free models on the challenging BenthamQA and HW-SQuAD datasets.
arXiv Detail & Related papers (2022-02-12T14:47:44Z) - Variational Learning for Unsupervised Knowledge Grounded Dialogs [6.761874595503588]
Recent methods for knowledge grounded dialogs generate responses by incorporating information from an external textual document.
We develop a variational approach to the above technique wherein, we instead maximize the Evidence Lower bound (ELBO)
To the best of our knowledge we are the first to apply variational training for open-scale unsupervised knowledge grounded dialog systems.
arXiv Detail & Related papers (2021-11-23T13:41:03Z) - Explaining Relationships Between Scientific Documents [55.23390424044378]
We address the task of explaining relationships between two scientific documents using natural language text.
In this paper we establish a dataset of 622K examples from 154K documents.
arXiv Detail & Related papers (2020-02-02T03:54:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.