You Only Need One Model for Open-domain Question Answering
- URL: http://arxiv.org/abs/2112.07381v1
- Date: Tue, 14 Dec 2021 13:21:11 GMT
- Title: You Only Need One Model for Open-domain Question Answering
- Authors: Haejun Lee, Akhil Kedia, Jongwon Lee, Ashwin Paranjape, Christopher D.
Manning, and Kyoung-Gu Woo
- Abstract summary: Recent works for Open-domain Question Answering refer to an external knowledge base using a retriever model.
We propose casting the retriever and the reranker as hard-attention mechanisms applied sequentially within the transformer architecture.
We evaluate our model on Natural Questions and TriviaQA open datasets and our model outperforms the previous state-of-the-art model by 1.0 and 0.7 exact match scores.
- Score: 26.582284346491686
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Recent works for Open-domain Question Answering refer to an external
knowledge base using a retriever model, optionally rerank the passages with a
separate reranker model and generate an answer using an another reader model.
Despite performing related tasks, the models have separate parameters and are
weakly-coupled during training. In this work, we propose casting the retriever
and the reranker as hard-attention mechanisms applied sequentially within the
transformer architecture and feeding the resulting computed representations to
the reader. In this singular model architecture the hidden representations are
progressively refined from the retriever to the reranker to the reader, which
is more efficient use of model capacity and also leads to better gradient flow
when we train it in an end-to-end manner. We also propose a pre-training
methodology to effectively train this architecture. We evaluate our model on
Natural Questions and TriviaQA open datasets and for a fixed parameter budget,
our model outperforms the previous state-of-the-art model by 1.0 and 0.7 exact
match scores.
Related papers
- Boot and Switch: Alternating Distillation for Zero-Shot Dense Retrieval [50.47192086219752]
$texttABEL$ is a simple but effective unsupervised method to enhance passage retrieval in zero-shot settings.
By either fine-tuning $texttABEL$ on labelled data or integrating it with existing supervised dense retrievers, we achieve state-of-the-art results.
arXiv Detail & Related papers (2023-11-27T06:22:57Z) - ProtoNER: Few shot Incremental Learning for Named Entity Recognition
using Prototypical Networks [7.317342506617286]
Prototypical Network based end-to-end KVP extraction model is presented.
No dependency on dataset used for initial training of the model.
No intermediate synthetic data generation which tends to add noise and results in model's performance degradation.
arXiv Detail & Related papers (2023-10-03T18:52:19Z) - Consensus-Adaptive RANSAC [104.87576373187426]
We propose a new RANSAC framework that learns to explore the parameter space by considering the residuals seen so far via a novel attention layer.
The attention mechanism operates on a batch of point-to-model residuals, and updates a per-point estimation state to take into account the consensus found through a lightweight one-step transformer.
arXiv Detail & Related papers (2023-07-26T08:25:46Z) - Prompt Generate Train (PGT): Few-shot Domain Adaption of Retrieval
Augmented Generation Models for Open Book Question-Answering [0.0]
We propose a framework to efficiently develop a generative question-answering model for open-book question-answering over a proprietary collection of text documents.
The framework adapts a retriever augmented generation (RAG) model to the target domain using supervised fine-tuning and reinforcement learning.
arXiv Detail & Related papers (2023-07-12T04:44:31Z) - Chain-of-Skills: A Configurable Model for Open-domain Question Answering [79.8644260578301]
The retrieval model is an indispensable component for real-world knowledge-intensive tasks.
Recent work focuses on customized methods, limiting the model transferability and scalability.
We propose a modular retriever where individual modules correspond to key skills that can be reused across datasets.
arXiv Detail & Related papers (2023-05-04T20:19:39Z) - Earning Extra Performance from Restrictive Feedbacks [41.05874087063763]
We set up a challenge named emphEarning eXtra PerformancE from restriCTive feEDdbacks (EXPECTED) to describe this form of model tuning problems.
The goal of the model provider is to eventually deliver a satisfactory model to the local user(s) by utilizing the feedbacks.
We propose to characterize the geometry of the model performance with regard to model parameters through exploring the parameters' distribution.
arXiv Detail & Related papers (2023-04-28T13:16:54Z) - Retrieval as Attention: End-to-end Learning of Retrieval and Reading
within a Single Transformer [80.50327229467993]
We show that a single model trained end-to-end can achieve both competitive retrieval and QA performance.
We show that end-to-end adaptation significantly boosts its performance on out-of-domain datasets in both supervised and unsupervised settings.
arXiv Detail & Related papers (2022-12-05T04:51:21Z) - Re-parameterizing Your Optimizers rather than Architectures [119.08740698936633]
We propose a novel paradigm of incorporating model-specific prior knowledge into Structurals and using them to train generic (simple) models.
As an implementation, we propose a novel methodology to add prior knowledge by modifying the gradients according to a set of model-specific hyper- parameters.
For a simple model trained with a Repr, we focus on a VGG-style plain model and showcase that such a simple model trained with a Repr, which is referred to as Rep-VGG, performs on par with the recent well-designed models.
arXiv Detail & Related papers (2022-05-30T16:55:59Z) - Improving Passage Retrieval with Zero-Shot Question Generation [109.11542468380331]
We propose a simple and effective re-ranking method for improving passage retrieval in open question answering.
The re-ranker re-scores retrieved passages with a zero-shot question generation model, which uses a pre-trained language model to compute the probability of the input question conditioned on a retrieved passage.
arXiv Detail & Related papers (2022-04-15T14:51:41Z) - Attention-guided Generative Models for Extractive Question Answering [17.476450946279037]
Recently, pretrained generative sequence-to-sequence (seq2seq) models have achieved great success in question answering.
We propose a simple strategy to obtain an extractive answer span from the generative model by leveraging the decoder cross-attention patterns.
arXiv Detail & Related papers (2021-10-12T23:02:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.