QUADRo: Dataset and Models for QUestion-Answer Database Retrieval
- URL: http://arxiv.org/abs/2304.01003v1
- Date: Thu, 30 Mar 2023 00:42:07 GMT
- Title: QUADRo: Dataset and Models for QUestion-Answer Database Retrieval
- Authors: Stefano Campese, Ivano Lauriola, Alessandro Moschitti
- Abstract summary: Given a database (DB) of question/answer (q/a) pairs, it is possible to answer a target question by scanning the DB for similar questions.
We build a large scale DB of 6.3M q/a pairs, using public questions, and design a new system based on neural IR and a q/a pair reranker.
We show that our DB-based approach is competitive with Web-based methods, i.e., a QA system built on top the BING search engine.
- Score: 97.84448420852854
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: An effective paradigm for building Automated Question Answering systems is
the re-use of previously answered questions, e.g., for FAQs or forum
applications. Given a database (DB) of question/answer (q/a) pairs, it is
possible to answer a target question by scanning the DB for similar questions.
In this paper, we scale this approach to open domain, making it competitive
with other standard methods, e.g., unstructured document or graph based. For
this purpose, we (i) build a large scale DB of 6.3M q/a pairs, using public
questions, (ii) design a new system based on neural IR and a q/a pair reranker,
and (iii) construct training and test data to perform comparative experiments
with our models. We demonstrate that Transformer-based models using (q,a) pairs
outperform models only based on question representation, for both neural search
and reranking. Additionally, we show that our DB-based approach is competitive
with Web-based methods, i.e., a QA system built on top the BING search engine,
demonstrating the challenge of finding relevant information. Finally, we make
our data and models available for future research.
Related papers
- A Lightweight Method to Generate Unanswerable Questions in English [18.323248259867356]
We examine a simpler data augmentation method for unanswerable question generation in English.
We perform antonym and entity swaps on answerable questions.
Compared to the prior state-of-the-art, data generated with our training-free and lightweight strategy results in better models.
arXiv Detail & Related papers (2023-10-30T10:14:52Z) - UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models [55.22048505787125]
This paper contributes a comprehensive dataset, called UNK-VQA.
We first augment the existing data via deliberate perturbations on either the image or question.
We then extensively evaluate the zero- and few-shot performance of several emerging multi-modal large models.
arXiv Detail & Related papers (2023-10-17T02:38:09Z) - An Empirical Comparison of LM-based Question and Answer Generation
Methods [79.31199020420827]
Question and answer generation (QAG) consists of generating a set of question-answer pairs given a context.
In this paper, we establish baselines with three different QAG methodologies that leverage sequence-to-sequence language model (LM) fine-tuning.
Experiments show that an end-to-end QAG model, which is computationally light at both training and inference times, is generally robust and outperforms other more convoluted approaches.
arXiv Detail & Related papers (2023-05-26T14:59:53Z) - Question-Answer Sentence Graph for Joint Modeling Answer Selection [122.29142965960138]
We train and integrate state-of-the-art (SOTA) models for computing scores between question-question, question-answer, and answer-answer pairs.
Online inference is then performed to solve the AS2 task on unseen queries.
arXiv Detail & Related papers (2022-02-16T05:59:53Z) - Joint Models for Answer Verification in Question Answering Systems [85.93456768689404]
We build a three-way multi-classifier, which decides if an answer supports, refutes, or is neutral with respect to another one.
We tested our models on WikiQA, TREC-QA, and a real-world dataset.
arXiv Detail & Related papers (2021-07-09T05:34:36Z) - Harvesting and Refining Question-Answer Pairs for Unsupervised QA [95.9105154311491]
We introduce two approaches to improve unsupervised Question Answering (QA)
First, we harvest lexically and syntactically divergent questions from Wikipedia to automatically construct a corpus of question-answer pairs (named as RefQA)
Second, we take advantage of the QA model to extract more appropriate answers, which iteratively refines data over RefQA.
arXiv Detail & Related papers (2020-05-06T15:56:06Z) - TCNN: Triple Convolutional Neural Network Models for Retrieval-based
Question Answering System in E-commerce [6.1786972717541895]
Key solution to the IR based models is to retrieve the most similar knowledge entries of a given query from a QA knowledge base, and then rerank those knowledge entries with semantic matching models.
In this paper, we aim to improve an IR based e-commerce QA system-AliMe with proposed text matching models, including a basic Triple Convolutional Neural Network (TCNN) model and two Attention-based TCNN (ATCNN) models.
arXiv Detail & Related papers (2020-04-23T01:02:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.