Related papers: End-to-End Multihop Retrieval for Compositional Question Answering over Long Documents

End-to-End Multihop Retrieval for Compositional Question Answering over Long Documents

URL: http://arxiv.org/abs/2106.00200v1
Date: Tue, 1 Jun 2021 03:13:35 GMT
Title: End-to-End Multihop Retrieval for Compositional Question Answering over Long Documents
Authors: Haitian Sun, William W. Cohen, Ruslan Salakhutdinov
Abstract summary: We propose a multi-hop retrieval method, DocHopper, to answer compositional questions over long documents. At each step, DocHopper retrieves a paragraph or sentence embedding from the document, mixes the retrieved result with the query, and updates the query for the next step. We demonstrate that utilizing document structure in this was can largely improve question-answering and retrieval performance on long documents.
Score: 93.55268936974971
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Answering complex questions from long documents requires aggregating multiple pieces of evidence and then predicting the answers. In this paper, we propose a multi-hop retrieval method, DocHopper, to answer compositional questions over long documents. At each step, DocHopper retrieves a paragraph or sentence embedding from the document, mixes the retrieved result with the query, and updates the query for the next step. In contrast to many other retrieval-based methods (e.g., RAG or REALM) the query is not augmented with a token sequence: instead, it is augmented by "numerically" combining it with another neural representation. This means that model is end-to-end differentiable. We demonstrate that utilizing document structure in this was can largely improve question-answering and retrieval performance on long documents. We experimented with DocHopper on three different QA tasks that require reading long documents to answer compositional questions: discourse entailment reasoning, factual QA with table and text, and information seeking QA from academic papers. DocHopper outperforms all baseline models and achieves state-of-the-art results on all datasets. Additionally, DocHopper is efficient at inference time, being 3~10 times faster than the baselines.

Related papers

SimpleDoc: Multi-Modal Document Understanding with Dual-Cue Page Retrieval and Iterative Refinement [17.272061289197342]
Document Visual Question Answering (DocVQA) is a practical yet challenging task.<n>Recent methods follow a similar Retrieval Augmented Generation (RAG) pipeline.<n>We introduce SimpleDoc, a lightweight yet powerful retrieval - augmented framework for DocVQA.
arXiv Detail & Related papers (2025-06-16T22:15:58Z)
DR-RAG: Applying Dynamic Document Relevance to Retrieval-Augmented Generation for Question-Answering [4.364937306005719]
RAG has recently demonstrated the performance of Large Language Models (LLMs) in the knowledge-intensive tasks such as Question-Answering (QA) We have found that even though there is low relevance between some critical documents and query, it is possible to retrieve the remaining documents by combining parts of the documents with the query. A two-stage retrieval framework called Dynamic-Relevant Retrieval-Augmented Generation (DR-RAG) is proposed to improve document retrieval recall and the accuracy of answers.
arXiv Detail & Related papers (2024-06-11T15:15:33Z)
PDFTriage: Question Answering over Long, Structured Documents [60.96667912964659]
Representing structured documents as plain text is incongruous with the user's mental model of these documents with rich structure. We propose PDFTriage that enables models to retrieve the context based on either structure or content. Our benchmark dataset consists of 900+ human-generated questions over 80 structured documents.
arXiv Detail & Related papers (2023-09-16T04:29:05Z)
DAPR: A Benchmark on Document-Aware Passage Retrieval [57.45793782107218]
We propose and name this task emphDocument-Aware Passage Retrieval (DAPR) While analyzing the errors of the State-of-The-Art (SoTA) passage retrievers, we find the major errors (53.5%) are due to missing document context. Our created benchmark enables future research on developing and comparing retrieval systems for the new task.
arXiv Detail & Related papers (2023-05-23T10:39:57Z)
CAPSTONE: Curriculum Sampling for Dense Retrieval with Document Expansion [68.19934563919192]
We propose a curriculum sampling strategy that utilizes pseudo queries during training and progressively enhances the relevance between the generated query and the real query. Experimental results on both in-domain and out-of-domain datasets demonstrate that our approach outperforms previous dense retrieval models.
arXiv Detail & Related papers (2022-12-18T15:57:46Z)
Answering Any-hop Open-domain Questions with Iterative Document Reranking [62.76025579681472]
We propose a unified QA framework to answer any-hop open-domain questions. Our method consistently achieves performance comparable to or better than the state-of-the-art on both single-hop and multi-hop open-domain QA datasets.
arXiv Detail & Related papers (2020-09-16T04:31:38Z)
Knowledge-Aided Open-Domain Question Answering [58.712857964048446]
We propose a knowledge-aided open-domain QA (KAQA) method which targets at improving relevant document retrieval and answer reranking. During document retrieval, a candidate document is scored by considering its relationship to the question and other documents. During answer reranking, a candidate answer is reranked using not only its own context but also the clues from other documents.
arXiv Detail & Related papers (2020-06-09T13:28:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.