Related papers: SPARTA: Efficient Open-Domain Question Answering via Sparse Transformer Matching Retrieval

SPARTA: Efficient Open-Domain Question Answering via Sparse Transformer Matching Retrieval

URL: http://arxiv.org/abs/2009.13013v1
Date: Mon, 28 Sep 2020 02:11:02 GMT
Title: SPARTA: Efficient Open-Domain Question Answering via Sparse Transformer Matching Retrieval
Authors: Tiancheng Zhao, Xiaopeng Lu, Kyusong Lee
Abstract summary: We introduce SPARTA, a novel neural retrieval method that shows great promise in performance, generalization, and interpretability for open-domain question answering. SPARTA learns a sparse representation that can be efficiently implemented as an Inverted Index. We validated our approaches on 4 open-domain question answering (OpenQA) tasks and 11 retrieval question answering (ReQA) tasks.
Score: 24.77260903221371
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce SPARTA, a novel neural retrieval method that shows great promise in performance, generalization, and interpretability for open-domain question answering. Unlike many neural ranking methods that use dense vector nearest neighbor search, SPARTA learns a sparse representation that can be efficiently implemented as an Inverted Index. The resulting representation enables scalable neural retrieval that does not require expensive approximate vector search and leads to better performance than its dense counterpart. We validated our approaches on 4 open-domain question answering (OpenQA) tasks and 11 retrieval question answering (ReQA) tasks. SPARTA achieves new state-of-the-art results across a variety of open-domain question answering tasks in both English and Chinese datasets, including open SQuAD, Natuarl Question, CMRC and etc. Analysis also confirms that the proposed method creates human interpretable representation and allows flexible control over the trade-off between performance and efficiency.

Related papers

Beyond Prompting: An Efficient Embedding Framework for Open-Domain Question Answering [15.04887070246276]
Large language models have recently pushed open domain question answering to new frontiers. prevailing retriever-reader pipelines often depend on multiple rounds of prompt level instructions. We propose EmbQA, an embedding-level framework that enhances both the retriever and the reader.
arXiv Detail & Related papers (2025-03-03T14:41:35Z)
pEBR: A Probabilistic Approach to Embedding Based Retrieval [4.8338111302871525]
Embedding retrieval aims to learn a shared semantic representation space for both queries and items. In current industrial practice, retrieval systems typically retrieve a fixed number of items for different queries.
arXiv Detail & Related papers (2024-10-25T07:14:12Z)
W-RAG: Weakly Supervised Dense Retrieval in RAG for Open-domain Question Answering [28.79851078451609]
We propose W-RAG, a method that draws weak training signals from the downstream task and fine-tunes the retriever to prioritize passages that most benefit the task. We conduct comprehensive experiments across four publicly available OpenQA datasets to demonstrate that our approach enhances both retrieval and OpenQA performance.
arXiv Detail & Related papers (2024-08-15T22:34:44Z)
Early Exit Strategies for Approximate k-NN Search in Dense Retrieval [10.48678957367324]
We build upon state-of-the-art for early exit A-kNN and propose an unsupervised method based on the notion of patience. We show that our techniques improve the A-kNN efficiency with up to 5x speedups while achieving negligible effectiveness losses.
arXiv Detail & Related papers (2024-08-09T10:17:07Z)
Building Interpretable and Reliable Open Information Retriever for New Domains Overnight [67.03842581848299]
Information retrieval is a critical component for many down-stream tasks such as open-domain question answering (QA) We propose an information retrieval pipeline that uses entity/event linking model and query decomposition model to focus more accurately on different information units of the query. We show that, while being more interpretable and reliable, our proposed pipeline significantly improves passage coverages and denotation accuracies across five IR and QA benchmarks.
arXiv Detail & Related papers (2023-08-09T07:47:17Z)
EfficientQA : a RoBERTa Based Phrase-Indexed Question-Answering System [0.0]
In this paper, we explore the possibility to transfer the natural language understanding of language models into dense vectors representing questions and answer candidates. Our model achieves state-of-the-art results in Phrase-Indexed Question Answering (PIQA) beating the previous state-of-art by 1.3 points in exact-match and 1.4 points in f1-score.
arXiv Detail & Related papers (2021-01-06T17:46:05Z)
Cross-Domain Generalization Through Memorization: A Study of Nearest Neighbors in Neural Duplicate Question Detection [72.01292864036087]
Duplicate question detection (DQD) is important to increase efficiency of community and automatic question answering systems. We leverage neural representations and study nearest neighbors for cross-domain generalization in DQD. We observe robust performance of this method in different cross-domain scenarios of StackExchange, Spring and Quora datasets.
arXiv Detail & Related papers (2020-11-22T19:19:33Z)
Global Optimization of Objective Functions Represented by ReLU Networks [77.55969359556032]
Neural networks can learn complex, non- adversarial functions, and it is challenging to guarantee their correct behavior in safety-critical contexts. Many approaches exist to find failures in networks (e.g., adversarial examples), but these cannot guarantee the absence of failures. We propose an approach that integrates the optimization process into the verification procedure, achieving better performance than the naive approach.
arXiv Detail & Related papers (2020-10-07T08:19:48Z)
Tradeoffs in Sentence Selection Techniques for Open-Domain Question Answering [54.541952928070344]
We describe two groups of models for sentence selection: QA-based approaches, which run a full-fledged QA system to identify answer candidates, and retrieval-based models, which find parts of each passage specifically related to each question. We show that very lightweight QA models can do well at this task, but retrieval-based models are faster still.
arXiv Detail & Related papers (2020-09-18T23:39:15Z)
Generation-Augmented Retrieval for Open-domain Question Answering [134.27768711201202]
Generation-Augmented Retrieval (GAR) for answering open-domain questions. We show that generating diverse contexts for a query is beneficial as fusing their results consistently yields better retrieval accuracy. GAR achieves state-of-the-art performance on Natural Questions and TriviaQA datasets under the extractive QA setup when equipped with an extractive reader.
arXiv Detail & Related papers (2020-09-17T23:08:01Z)
Dense Passage Retrieval for Open-Domain Question Answering [49.028342823838486]
We show that retrieval can be practically implemented using dense representations alone. Our dense retriever outperforms a strong Lucene-BM25 system largely by 9%-19% absolute in terms of top-20 passage retrieval accuracy.
arXiv Detail & Related papers (2020-04-10T04:53:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.