Two-Step Question Retrieval for Open-Domain QA
- URL: http://arxiv.org/abs/2205.09393v1
- Date: Thu, 19 May 2022 08:46:14 GMT
- Title: Two-Step Question Retrieval for Open-Domain QA
- Authors: Yeon Seonwoo, Juhee Son, Jiho Jin, Sang-Woo Lee, Ji-Hoon Kim, Jung-Woo
Ha, Alice Oh
- Abstract summary: retriever-reader pipeline has shown promising performance in open-domain QA but suffers from a very slow inference speed.
Recently proposed question retrieval models tackle this problem by indexing question-answer pairs and searching for similar questions.
SQuID significantly increases the performance of existing question retrieval models with a negligible loss on inference speed.
- Score: 27.37731471419776
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The retriever-reader pipeline has shown promising performance in open-domain
QA but suffers from a very slow inference speed. Recently proposed question
retrieval models tackle this problem by indexing question-answer pairs and
searching for similar questions. These models have shown a significant increase
in inference speed, but at the cost of lower QA performance compared to the
retriever-reader models. This paper proposes a two-step question retrieval
model, SQuID (Sequential Question-Indexed Dense retrieval) and distant
supervision for training. SQuID uses two bi-encoders for question retrieval.
The first-step retriever selects top-k similar questions, and the second-step
retriever finds the most similar question from the top-k questions. We evaluate
the performance and the computational efficiency of SQuID. The results show
that SQuID significantly increases the performance of existing question
retrieval models with a negligible loss on inference speed.
Related papers
- Toward Optimal Search and Retrieval for RAG [39.69494982983534]
Retrieval-augmented generation (RAG) is a promising method for addressing some of the memory-related challenges associated with Large Language Models (LLMs)
Here, we work towards the goal of understanding how retrievers can be optimized for RAG pipelines for common tasks such as Question Answering (QA)
arXiv Detail & Related papers (2024-11-11T22:06:51Z) - DEXTER: A Benchmark for open-domain Complex Question Answering using LLMs [3.24692739098077]
Open-domain complex Question Answering (QA) is a difficult task with challenges in evidence retrieval and reasoning.
We evaluate state-of-the-art pre-trained dense and sparse retrieval models in an open-domain setting.
We observe that late interaction models and surprisingly lexical models like BM25 perform well compared to other pre-trained dense retrieval models.
arXiv Detail & Related papers (2024-06-24T22:09:50Z) - AGent: A Novel Pipeline for Automatically Creating Unanswerable
Questions [10.272000561545331]
We propose AGent, a novel pipeline that creates new unanswerable questions by re-matching a question with a context that lacks the necessary information for a correct answer.
In this paper, we demonstrate the usefulness of this AGent pipeline by creating two sets of unanswerable questions from answerable questions in SQuAD and HotpotQA.
arXiv Detail & Related papers (2023-09-10T18:13:11Z) - ReFIT: Relevance Feedback from a Reranker during Inference [109.33278799999582]
Retrieve-and-rerank is a prevalent framework in neural information retrieval.
We propose to leverage the reranker to improve recall by making it provide relevance feedback to the retriever at inference time.
arXiv Detail & Related papers (2023-05-19T15:30:33Z) - Toward Unsupervised Realistic Visual Question Answering [70.67698100148414]
We study the problem of realistic VQA (RVQA), where a model has to reject unanswerable questions (UQs) and answer answerable ones (AQs)
We first point out 2 drawbacks in current RVQA research, where (1) datasets contain too many unchallenging UQs and (2) a large number of annotated UQs are required for training.
We propose a new testing dataset, RGQA, which combines AQs from an existing VQA dataset with around 29K human-annotated UQs.
This combines pseudo UQs obtained by randomly pairing images and questions, with an
arXiv Detail & Related papers (2023-03-09T06:58:29Z) - OneStop QAMaker: Extract Question-Answer Pairs from Text in a One-Stop
Approach [11.057028572260064]
We propose a model named OneStop to generate QA pairs from documents in a one-stop approach.
Specifically, questions and their corresponding answer span is extracted simultaneously.
OneStop is much more efficient to be trained and deployed in industrial scenarios since it involves only one model to solve the complex QA generation task.
arXiv Detail & Related papers (2021-02-24T08:45:00Z) - Open Question Answering over Tables and Text [55.8412170633547]
In open question answering (QA), the answer to a question is produced by retrieving and then analyzing documents that might contain answers to the question.
Most open QA systems have considered only retrieving information from unstructured text.
We present a new large-scale dataset Open Table-and-Text Question Answering (OTT-QA) to evaluate performance on this task.
arXiv Detail & Related papers (2020-10-20T16:48:14Z) - Tradeoffs in Sentence Selection Techniques for Open-Domain Question
Answering [54.541952928070344]
We describe two groups of models for sentence selection: QA-based approaches, which run a full-fledged QA system to identify answer candidates, and retrieval-based models, which find parts of each passage specifically related to each question.
We show that very lightweight QA models can do well at this task, but retrieval-based models are faster still.
arXiv Detail & Related papers (2020-09-18T23:39:15Z) - Answering Any-hop Open-domain Questions with Iterative Document
Reranking [62.76025579681472]
We propose a unified QA framework to answer any-hop open-domain questions.
Our method consistently achieves performance comparable to or better than the state-of-the-art on both single-hop and multi-hop open-domain QA datasets.
arXiv Detail & Related papers (2020-09-16T04:31:38Z) - Harvesting and Refining Question-Answer Pairs for Unsupervised QA [95.9105154311491]
We introduce two approaches to improve unsupervised Question Answering (QA)
First, we harvest lexically and syntactically divergent questions from Wikipedia to automatically construct a corpus of question-answer pairs (named as RefQA)
Second, we take advantage of the QA model to extract more appropriate answers, which iteratively refines data over RefQA.
arXiv Detail & Related papers (2020-05-06T15:56:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.