Retrieval as Attention: End-to-end Learning of Retrieval and Reading
within a Single Transformer
- URL: http://arxiv.org/abs/2212.02027v1
- Date: Mon, 5 Dec 2022 04:51:21 GMT
- Title: Retrieval as Attention: End-to-end Learning of Retrieval and Reading
within a Single Transformer
- Authors: Zhengbao Jiang, Luyu Gao, Jun Araki, Haibo Ding, Zhiruo Wang, Jamie
Callan, Graham Neubig
- Abstract summary: We show that a single model trained end-to-end can achieve both competitive retrieval and QA performance.
We show that end-to-end adaptation significantly boosts its performance on out-of-domain datasets in both supervised and unsupervised settings.
- Score: 80.50327229467993
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Systems for knowledge-intensive tasks such as open-domain question answering
(QA) usually consist of two stages: efficient retrieval of relevant documents
from a large corpus and detailed reading of the selected documents to generate
answers. Retrievers and readers are usually modeled separately, which
necessitates a cumbersome implementation and is hard to train and adapt in an
end-to-end fashion. In this paper, we revisit this design and eschew the
separate architecture and training in favor of a single Transformer that
performs Retrieval as Attention (ReAtt), and end-to-end training solely based
on supervision from the end QA task. We demonstrate for the first time that a
single model trained end-to-end can achieve both competitive retrieval and QA
performance, matching or slightly outperforming state-of-the-art separately
trained retrievers and readers. Moreover, end-to-end adaptation significantly
boosts its performance on out-of-domain datasets in both supervised and
unsupervised settings, making our model a simple and adaptable solution for
knowledge-intensive tasks. Code and models are available at
https://github.com/jzbjyb/ReAtt.
Related papers
- Few-shot Prompting for Pairwise Ranking: An Effective Non-Parametric Retrieval Model [18.111868378615206]
We propose a pairwise few-shot ranker that achieves a close performance to that of a supervised model without requiring any complex training pipeline.
Our method also achieves a close performance to that of a supervised model without requiring any complex training pipeline.
arXiv Detail & Related papers (2024-09-26T11:19:09Z) - Parameter-Efficient and Memory-Efficient Tuning for Vision Transformer: A Disentangled Approach [87.8330887605381]
We show how to adapt a pre-trained Vision Transformer to downstream recognition tasks with only a few learnable parameters.
We synthesize a task-specific query with a learnable and lightweight module, which is independent of the pre-trained model.
Our method achieves state-of-the-art performance under memory constraints, showcasing its applicability in real-world situations.
arXiv Detail & Related papers (2024-07-09T15:45:04Z) - Chain-of-Skills: A Configurable Model for Open-domain Question Answering [79.8644260578301]
The retrieval model is an indispensable component for real-world knowledge-intensive tasks.
Recent work focuses on customized methods, limiting the model transferability and scalability.
We propose a modular retriever where individual modules correspond to key skills that can be reused across datasets.
arXiv Detail & Related papers (2023-05-04T20:19:39Z) - Questions Are All You Need to Train a Dense Passage Retriever [123.13872383489172]
ART is a new corpus-level autoencoding approach for training dense retrieval models that does not require any labeled training data.
It uses a new document-retrieval autoencoding scheme, where (1) an input question is used to retrieve a set of evidence documents, and (2) the documents are then used to compute the probability of reconstructing the original question.
arXiv Detail & Related papers (2022-06-21T18:16:31Z) - Weakly Supervised Pre-Training for Multi-Hop Retriever [23.79574380039197]
We propose a new method for weakly supervised multi-hop retriever pre-training without human efforts.
Our method includes 1) a pre-training task for generating vector representations of complex questions, 2) a scalable data generation method that produces the nested structure of question and sub-question as weak supervision for pre-training, and 3) a pre-training model structure based on dense encoders.
arXiv Detail & Related papers (2021-06-18T08:06:02Z) - Tradeoffs in Sentence Selection Techniques for Open-Domain Question
Answering [54.541952928070344]
We describe two groups of models for sentence selection: QA-based approaches, which run a full-fledged QA system to identify answer candidates, and retrieval-based models, which find parts of each passage specifically related to each question.
We show that very lightweight QA models can do well at this task, but retrieval-based models are faster still.
arXiv Detail & Related papers (2020-09-18T23:39:15Z) - Pre-training Tasks for Embedding-based Large-scale Retrieval [68.01167604281578]
We consider the large-scale query-document retrieval problem.
Given a query (e.g., a question), return the set of relevant documents from a large document corpus.
We show that the key ingredient of learning a strong embedding-based Transformer model is the set of pre-training tasks.
arXiv Detail & Related papers (2020-02-10T16:44:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.