PrimeQA: The Prime Repository for State-of-the-Art Multilingual Question
Answering Research and Development
- URL: http://arxiv.org/abs/2301.09715v2
- Date: Wed, 25 Jan 2023 15:48:03 GMT
- Title: PrimeQA: The Prime Repository for State-of-the-Art Multilingual Question
Answering Research and Development
- Authors: Avirup Sil, Jaydeep Sen, Bhavani Iyer, Martin Franz, Kshitij Fadnis,
Mihaela Bornea, Sara Rosenthal, Scott McCarley, Rong Zhang, Vishwajeet Kumar,
Yulong Li, Md Arafat Sultan, Riyaz Bhat, Radu Florian, Salim Roukos
- Abstract summary: PRIMEQA is a one-stop QA repository with an aim to democratize QA re-search and facilitate easy replication of state-of-the-art (SOTA) QA methods.
It supports core QA functionalities like retrieval and reading comprehension as well as auxiliary capabilities such as question generation.
It has been designed as an end-to-end toolkit for various use cases: building front-end applications, replicating SOTA methods on pub-lic benchmarks, and expanding pre-existing methods.
- Score: 24.022050096797606
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The field of Question Answering (QA) has made remarkable progress in recent
years, thanks to the advent of large pre-trained language models, newer
realistic benchmark datasets with leaderboards, and novel algorithms for key
components such as retrievers and readers. In this paper, we introduce PRIMEQA:
a one-stop and open-source QA repository with an aim to democratize QA
re-search and facilitate easy replication of state-of-the-art (SOTA) QA
methods. PRIMEQA supports core QA functionalities like retrieval and reading
comprehension as well as auxiliary capabilities such as question generation.It
has been designed as an end-to-end toolkit for various use cases: building
front-end applications, replicating SOTA methods on pub-lic benchmarks, and
expanding pre-existing methods. PRIMEQA is available at :
https://github.com/primeqa.
Related papers
- Building Efficient and Effective OpenQA Systems for Low-Resource Languages [17.64851283209797]
We show that effective, low-cost OpenQA systems can be developed for low-resource contexts.
Key ingredients are weak supervision using machine-translated labeled datasets and a relevant unstructured knowledge source.
We present SQuAD-TR, a machine translation of SQuAD2.0, and we build our OpenQA system by adapting ColBERT-QA and retraining it over Turkish resources.
arXiv Detail & Related papers (2024-01-07T22:11:36Z) - Long-Tailed Question Answering in an Open World [46.67715607552547]
We define Open Long-Tailed QA (OLTQA) as learning from long-tailed distributed data.
We propose an OLTQA model that encourages knowledge sharing between head, tail and unseen tasks.
On a large-scale OLTQA dataset, our model consistently outperforms the state-of-the-art.
arXiv Detail & Related papers (2023-05-11T04:28:58Z) - Augmenting Pre-trained Language Models with QA-Memory for Open-Domain
Question Answering [38.071375112873675]
We propose a question-answer augmented encoder-decoder model and accompanying pretraining strategy.
This yields an end-to-end system that outperforms prior QA retrieval methods on single-hop QA tasks.
arXiv Detail & Related papers (2022-04-10T02:33:00Z) - Retrieving and Reading: A Comprehensive Survey on Open-domain Question
Answering [62.88322725956294]
We review the latest research trends in OpenQA, with particular attention to systems that incorporate neural MRC techniques.
We introduce modern OpenQA architecture named Retriever-Reader'' and analyze the various systems that follow this architecture.
We then discuss key challenges to developing OpenQA systems and offer an analysis of benchmarks that are commonly used.
arXiv Detail & Related papers (2021-01-04T04:47:46Z) - Open Question Answering over Tables and Text [55.8412170633547]
In open question answering (QA), the answer to a question is produced by retrieving and then analyzing documents that might contain answers to the question.
Most open QA systems have considered only retrieving information from unstructured text.
We present a new large-scale dataset Open Table-and-Text Question Answering (OTT-QA) to evaluate performance on this task.
arXiv Detail & Related papers (2020-10-20T16:48:14Z) - Tradeoffs in Sentence Selection Techniques for Open-Domain Question
Answering [54.541952928070344]
We describe two groups of models for sentence selection: QA-based approaches, which run a full-fledged QA system to identify answer candidates, and retrieval-based models, which find parts of each passage specifically related to each question.
We show that very lightweight QA models can do well at this task, but retrieval-based models are faster still.
arXiv Detail & Related papers (2020-09-18T23:39:15Z) - NeuralQA: A Usable Library for Question Answering (Contextual Query
Expansion + BERT) on Large Datasets [0.6091702876917281]
NeuralQA is a library for Question Answering (QA) on large datasets.
It integrates with existing infrastructure (e.g., ElasticSearch instances and reader models trained with the HuggingFace Transformers API) and offers helpful defaults for QA subtasks.
Code and documentation for NeuralQA is available as open source on Github.
arXiv Detail & Related papers (2020-07-30T03:38:30Z) - Generating Diverse and Consistent QA pairs from Contexts with
Information-Maximizing Hierarchical Conditional VAEs [62.71505254770827]
We propose a conditional variational autoencoder (HCVAE) for generating QA pairs given unstructured texts as contexts.
Our model obtains impressive performance gains over all baselines on both tasks, using only a fraction of data for training.
arXiv Detail & Related papers (2020-05-28T08:26:06Z) - Harvesting and Refining Question-Answer Pairs for Unsupervised QA [95.9105154311491]
We introduce two approaches to improve unsupervised Question Answering (QA)
First, we harvest lexically and syntactically divergent questions from Wikipedia to automatically construct a corpus of question-answer pairs (named as RefQA)
Second, we take advantage of the QA model to extract more appropriate answers, which iteratively refines data over RefQA.
arXiv Detail & Related papers (2020-05-06T15:56:06Z) - Template-Based Question Generation from Retrieved Sentences for Improved
Unsupervised Question Answering [98.48363619128108]
We propose an unsupervised approach to training QA models with generated pseudo-training data.
We show that generating questions for QA training by applying a simple template on a related, retrieved sentence rather than the original context sentence improves downstream QA performance.
arXiv Detail & Related papers (2020-04-24T17:57:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.