End-to-End Multihop Retrieval for Compositional Question Answering over
Long Documents
- URL: http://arxiv.org/abs/2106.00200v1
- Date: Tue, 1 Jun 2021 03:13:35 GMT
- Title: End-to-End Multihop Retrieval for Compositional Question Answering over
Long Documents
- Authors: Haitian Sun, William W. Cohen, Ruslan Salakhutdinov
- Abstract summary: We propose a multi-hop retrieval method, DocHopper, to answer compositional questions over long documents.
At each step, DocHopper retrieves a paragraph or sentence embedding from the document, mixes the retrieved result with the query, and updates the query for the next step.
We demonstrate that utilizing document structure in this was can largely improve question-answering and retrieval performance on long documents.
- Score: 93.55268936974971
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Answering complex questions from long documents requires aggregating multiple
pieces of evidence and then predicting the answers. In this paper, we propose a
multi-hop retrieval method, DocHopper, to answer compositional questions over
long documents. At each step, DocHopper retrieves a paragraph or sentence
embedding from the document, mixes the retrieved result with the query, and
updates the query for the next step. In contrast to many other retrieval-based
methods (e.g., RAG or REALM) the query is not augmented with a token sequence:
instead, it is augmented by "numerically" combining it with another neural
representation. This means that model is end-to-end differentiable. We
demonstrate that utilizing document structure in this was can largely improve
question-answering and retrieval performance on long documents. We experimented
with DocHopper on three different QA tasks that require reading long documents
to answer compositional questions: discourse entailment reasoning, factual QA
with table and text, and information seeking QA from academic papers. DocHopper
outperforms all baseline models and achieves state-of-the-art results on all
datasets. Additionally, DocHopper is efficient at inference time, being 3~10
times faster than the baselines.
Related papers
- DR-RAG: Applying Dynamic Document Relevance to Retrieval-Augmented Generation for Question-Answering [4.364937306005719]
RAG has recently demonstrated the performance of Large Language Models (LLMs) in the knowledge-intensive tasks such as Question-Answering (QA)
We have found that even though there is low relevance between some critical documents and query, it is possible to retrieve the remaining documents by combining parts of the documents with the query.
A two-stage retrieval framework called Dynamic-Relevant Retrieval-Augmented Generation (DR-RAG) is proposed to improve document retrieval recall and the accuracy of answers.
arXiv Detail & Related papers (2024-06-11T15:15:33Z) - PDFTriage: Question Answering over Long, Structured Documents [60.96667912964659]
Representing structured documents as plain text is incongruous with the user's mental model of these documents with rich structure.
We propose PDFTriage that enables models to retrieve the context based on either structure or content.
Our benchmark dataset consists of 900+ human-generated questions over 80 structured documents.
arXiv Detail & Related papers (2023-09-16T04:29:05Z) - DAPR: A Benchmark on Document-Aware Passage Retrieval [57.45793782107218]
We propose and name this task emphDocument-Aware Passage Retrieval (DAPR)
While analyzing the errors of the State-of-The-Art (SoTA) passage retrievers, we find the major errors (53.5%) are due to missing document context.
Our created benchmark enables future research on developing and comparing retrieval systems for the new task.
arXiv Detail & Related papers (2023-05-23T10:39:57Z) - CAPSTONE: Curriculum Sampling for Dense Retrieval with Document
Expansion [68.19934563919192]
We propose a curriculum sampling strategy that utilizes pseudo queries during training and progressively enhances the relevance between the generated query and the real query.
Experimental results on both in-domain and out-of-domain datasets demonstrate that our approach outperforms previous dense retrieval models.
arXiv Detail & Related papers (2022-12-18T15:57:46Z) - Answering Any-hop Open-domain Questions with Iterative Document
Reranking [62.76025579681472]
We propose a unified QA framework to answer any-hop open-domain questions.
Our method consistently achieves performance comparable to or better than the state-of-the-art on both single-hop and multi-hop open-domain QA datasets.
arXiv Detail & Related papers (2020-09-16T04:31:38Z) - Knowledge-Aided Open-Domain Question Answering [58.712857964048446]
We propose a knowledge-aided open-domain QA (KAQA) method which targets at improving relevant document retrieval and answer reranking.
During document retrieval, a candidate document is scored by considering its relationship to the question and other documents.
During answer reranking, a candidate answer is reranked using not only its own context but also the clues from other documents.
arXiv Detail & Related papers (2020-06-09T13:28:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.