SPARTA: Efficient Open-Domain Question Answering via Sparse Transformer
Matching Retrieval
- URL: http://arxiv.org/abs/2009.13013v1
- Date: Mon, 28 Sep 2020 02:11:02 GMT
- Title: SPARTA: Efficient Open-Domain Question Answering via Sparse Transformer
Matching Retrieval
- Authors: Tiancheng Zhao, Xiaopeng Lu, Kyusong Lee
- Abstract summary: We introduce SPARTA, a novel neural retrieval method that shows great promise in performance, generalization, and interpretability for open-domain question answering.
SPARTA learns a sparse representation that can be efficiently implemented as an Inverted Index.
We validated our approaches on 4 open-domain question answering (OpenQA) tasks and 11 retrieval question answering (ReQA) tasks.
- Score: 24.77260903221371
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce SPARTA, a novel neural retrieval method that shows great promise
in performance, generalization, and interpretability for open-domain question
answering. Unlike many neural ranking methods that use dense vector nearest
neighbor search, SPARTA learns a sparse representation that can be efficiently
implemented as an Inverted Index. The resulting representation enables scalable
neural retrieval that does not require expensive approximate vector search and
leads to better performance than its dense counterpart. We validated our
approaches on 4 open-domain question answering (OpenQA) tasks and 11 retrieval
question answering (ReQA) tasks. SPARTA achieves new state-of-the-art results
across a variety of open-domain question answering tasks in both English and
Chinese datasets, including open SQuAD, Natuarl Question, CMRC and etc.
Analysis also confirms that the proposed method creates human interpretable
representation and allows flexible control over the trade-off between
performance and efficiency.
Related papers
- pEBR: A Probabilistic Approach to Embedding Based Retrieval [4.8338111302871525]
Embedding retrieval aims to learn a shared semantic representation space for both queries and items.
In current industrial practice, retrieval systems typically retrieve a fixed number of items for different queries.
arXiv Detail & Related papers (2024-10-25T07:14:12Z) - Building Interpretable and Reliable Open Information Retriever for New
Domains Overnight [67.03842581848299]
Information retrieval is a critical component for many down-stream tasks such as open-domain question answering (QA)
We propose an information retrieval pipeline that uses entity/event linking model and query decomposition model to focus more accurately on different information units of the query.
We show that, while being more interpretable and reliable, our proposed pipeline significantly improves passage coverages and denotation accuracies across five IR and QA benchmarks.
arXiv Detail & Related papers (2023-08-09T07:47:17Z) - Speech Sequence Embeddings using Nearest Neighbors Contrastive Learning [15.729812221628382]
We introduce a simple neural encoder architecture that can be trained using an unsupervised contrastive learning objective.
We show that when built on top of recent self-supervised audio representations, this method can be applied iteratively and yield competitive SSE.
arXiv Detail & Related papers (2022-04-11T14:28:01Z) - EfficientQA : a RoBERTa Based Phrase-Indexed Question-Answering System [0.0]
In this paper, we explore the possibility to transfer the natural language understanding of language models into dense vectors representing questions and answer candidates.
Our model achieves state-of-the-art results in Phrase-Indexed Question Answering (PIQA) beating the previous state-of-art by 1.3 points in exact-match and 1.4 points in f1-score.
arXiv Detail & Related papers (2021-01-06T17:46:05Z) - Cross-Domain Generalization Through Memorization: A Study of Nearest
Neighbors in Neural Duplicate Question Detection [72.01292864036087]
Duplicate question detection (DQD) is important to increase efficiency of community and automatic question answering systems.
We leverage neural representations and study nearest neighbors for cross-domain generalization in DQD.
We observe robust performance of this method in different cross-domain scenarios of StackExchange, Spring and Quora datasets.
arXiv Detail & Related papers (2020-11-22T19:19:33Z) - Global Optimization of Objective Functions Represented by ReLU Networks [77.55969359556032]
Neural networks can learn complex, non- adversarial functions, and it is challenging to guarantee their correct behavior in safety-critical contexts.
Many approaches exist to find failures in networks (e.g., adversarial examples), but these cannot guarantee the absence of failures.
We propose an approach that integrates the optimization process into the verification procedure, achieving better performance than the naive approach.
arXiv Detail & Related papers (2020-10-07T08:19:48Z) - Tradeoffs in Sentence Selection Techniques for Open-Domain Question
Answering [54.541952928070344]
We describe two groups of models for sentence selection: QA-based approaches, which run a full-fledged QA system to identify answer candidates, and retrieval-based models, which find parts of each passage specifically related to each question.
We show that very lightweight QA models can do well at this task, but retrieval-based models are faster still.
arXiv Detail & Related papers (2020-09-18T23:39:15Z) - Generation-Augmented Retrieval for Open-domain Question Answering [134.27768711201202]
Generation-Augmented Retrieval (GAR) for answering open-domain questions.
We show that generating diverse contexts for a query is beneficial as fusing their results consistently yields better retrieval accuracy.
GAR achieves state-of-the-art performance on Natural Questions and TriviaQA datasets under the extractive QA setup when equipped with an extractive reader.
arXiv Detail & Related papers (2020-09-17T23:08:01Z) - Dense Passage Retrieval for Open-Domain Question Answering [49.028342823838486]
We show that retrieval can be practically implemented using dense representations alone.
Our dense retriever outperforms a strong Lucene-BM25 system largely by 9%-19% absolute in terms of top-20 passage retrieval accuracy.
arXiv Detail & Related papers (2020-04-10T04:53:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.