FastFiD: Improve Inference Efficiency of Open Domain Question Answering via Sentence Selection
- URL: http://arxiv.org/abs/2408.06333v1
- Date: Mon, 12 Aug 2024 17:50:02 GMT
- Title: FastFiD: Improve Inference Efficiency of Open Domain Question Answering via Sentence Selection
- Authors: Yufei Huang, Xu Han, Maosong Sun,
- Abstract summary: FastFiD is a novel approach that executes sentence selection on encoded passages.
This aids in retaining valuable sentences while reducing the context length required for generating answers.
- Score: 61.9638234358049
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Open Domain Question Answering (ODQA) has been advancing rapidly in recent times, driven by significant developments in dense passage retrieval and pretrained language models. Current models typically incorporate the FiD framework, which is composed by a neural retriever alongside an encoder-decoder neural reader. In the answer generation process, the retriever will retrieve numerous passages (around 100 for instance), each of which is then individually encoded by the encoder. Subsequently, the decoder makes predictions based on these encoded passages. Nevertheless, this framework can be relatively time-consuming, particularly due to the extensive length of the gathered passages. To address this, we introduce FastFiD in this paper, a novel approach that executes sentence selection on the encoded passages. This aids in retaining valuable sentences while reducing the context length required for generating answers. Experiments on three commonly used datasets (Natural Questions, TriviaQA and ASQA) demonstrate that our method can enhance the inference speed by 2.3X-5.7X, while simultaneously maintaining the model's performance. Moreover, an in-depth analysis of the model's attention reveals that the selected sentences indeed hold a substantial contribution towards the final answer. The codes are publicly available at https://github.com/thunlp/FastFiD.
Related papers
- Chimera: A Lossless Decoding Method for Accelerating Large Language Models Inference by Fusing all Tokens [15.566726645722657]
We propose a novel framework specifically designed for speculative sampling.
Within this framework, we introduce a lightweight draft model that effectively utilizes previously generated tokens to predict subsequent words.
We demonstrate impressive results, achieving an average latency speedup ratio of 2.7x compared to the vanilla auto-regressive decoding approach.
arXiv Detail & Related papers (2024-02-24T08:10:39Z) - Reranking Passages with Coarse-to-Fine Neural Retriever Enhanced by List-Context Information [0.9463895540925061]
This paper presents a list-context attention mechanism to augment the passage representation by incorporating the list-context information from other candidates.
The proposed coarse-to-fine (C2F) neural retriever addresses the out-of-memory limitation of the passage attention mechanism.
It integrates the coarse and fine rankers into the joint optimization process, allowing for feedback between the two layers to update the model simultaneously.
arXiv Detail & Related papers (2023-08-23T09:29:29Z) - ED2LM: Encoder-Decoder to Language Model for Faster Document Re-ranking
Inference [70.36083572306839]
This paper proposes a new training and inference paradigm for re-ranking.
We finetune a pretrained encoder-decoder model using in the form of document to query generation.
We show that this encoder-decoder architecture can be decomposed into a decoder-only language model during inference.
arXiv Detail & Related papers (2022-04-25T06:26:29Z) - KG-FiD: Infusing Knowledge Graph in Fusion-in-Decoder for Open-Domain
Question Answering [68.00631278030627]
We propose a novel method KG-FiD, which filters noisy passages by leveraging the structural relationship among the retrieved passages with a knowledge graph.
We show that KG-FiD can improve vanilla FiD by up to 1.5% on answer exact match score and achieve comparable performance with FiD with only 40% of computation cost.
arXiv Detail & Related papers (2021-10-08T18:39:59Z) - Question Answering Infused Pre-training of General-Purpose
Contextualized Representations [70.62967781515127]
We propose a pre-training objective based on question answering (QA) for learning general-purpose contextual representations.
We accomplish this goal by training a bi-encoder QA model, which independently encodes passages and questions, to match the predictions of a more accurate cross-encoder model.
We show large improvements over both RoBERTa-large and previous state-of-the-art results on zero-shot and few-shot paraphrase detection.
arXiv Detail & Related papers (2021-06-15T14:45:15Z) - Cross-Thought for Sentence Encoder Pre-training [89.32270059777025]
Cross-Thought is a novel approach to pre-training sequence encoder.
We train a Transformer-based sequence encoder over a large set of short sequences.
Experiments on question answering and textual entailment tasks demonstrate that our pre-trained encoder can outperform state-of-the-art encoders.
arXiv Detail & Related papers (2020-10-07T21:02:41Z) - Composing Answer from Multi-spans for Reading Comprehension [77.32873012668783]
We present a novel method to generate answers for non-extraction machine reading comprehension (MRC) tasks.
The proposed method has a better performance on accurately generating long answers, and substantially outperforms two competitive typical one-span and Seq2Seq baseline decoders.
arXiv Detail & Related papers (2020-09-14T01:44:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.