AGent: A Novel Pipeline for Automatically Creating Unanswerable
Questions
- URL: http://arxiv.org/abs/2309.05103v1
- Date: Sun, 10 Sep 2023 18:13:11 GMT
- Title: AGent: A Novel Pipeline for Automatically Creating Unanswerable
Questions
- Authors: Son Quoc Tran, Gia-Huy Do, Phong Nguyen-Thuan Do, Matt Kretchmar,
Xinya Du
- Abstract summary: We propose AGent, a novel pipeline that creates new unanswerable questions by re-matching a question with a context that lacks the necessary information for a correct answer.
In this paper, we demonstrate the usefulness of this AGent pipeline by creating two sets of unanswerable questions from answerable questions in SQuAD and HotpotQA.
- Score: 10.272000561545331
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: The development of large high-quality datasets and high-performing models
have led to significant advancements in the domain of Extractive Question
Answering (EQA). This progress has sparked considerable interest in exploring
unanswerable questions within the EQA domain. Training EQA models with
unanswerable questions helps them avoid extracting misleading or incorrect
answers for queries that lack valid responses. However, manually annotating
unanswerable questions is labor-intensive. To address this, we propose AGent, a
novel pipeline that automatically creates new unanswerable questions by
re-matching a question with a context that lacks the necessary information for
a correct answer. In this paper, we demonstrate the usefulness of this AGent
pipeline by creating two sets of unanswerable questions from answerable
questions in SQuAD and HotpotQA. These created question sets exhibit low error
rates. Additionally, models fine-tuned on these questions show comparable
performance with those fine-tuned on the SQuAD 2.0 dataset on multiple EQA
benchmarks.
Related papers
- Long-form Question Answering: An Iterative Planning-Retrieval-Generation
Approach [28.849548176802262]
Long-form question answering (LFQA) poses a challenge as it involves generating detailed answers in the form of paragraphs.
We propose an LFQA model with iterative Planning, Retrieval, and Generation.
We find that our model outperforms the state-of-the-art models on various textual and factual metrics for the LFQA task.
arXiv Detail & Related papers (2023-11-15T21:22:27Z) - Answering Ambiguous Questions with a Database of Questions, Answers, and
Revisions [95.92276099234344]
We present a new state-of-the-art for answering ambiguous questions that exploits a database of unambiguous questions generated from Wikipedia.
Our method improves performance by 15% on recall measures and 10% on measures which evaluate disambiguating questions from predicted outputs.
arXiv Detail & Related papers (2023-08-16T20:23:16Z) - Two-Step Question Retrieval for Open-Domain QA [27.37731471419776]
retriever-reader pipeline has shown promising performance in open-domain QA but suffers from a very slow inference speed.
Recently proposed question retrieval models tackle this problem by indexing question-answer pairs and searching for similar questions.
SQuID significantly increases the performance of existing question retrieval models with a negligible loss on inference speed.
arXiv Detail & Related papers (2022-05-19T08:46:14Z) - Co-VQA : Answering by Interactive Sub Question Sequence [18.476819557695087]
This paper proposes a conversation-based VQA framework, which consists of three components: Questioner, Oracle, and Answerer.
To perform supervised learning for each model, we introduce a well-designed method to build a SQS for each question on VQA 2.0 and VQA-CP v2 datasets.
arXiv Detail & Related papers (2022-04-02T15:09:16Z) - Relation-Guided Pre-Training for Open-Domain Question Answering [67.86958978322188]
We propose a Relation-Guided Pre-Training (RGPT-QA) framework to solve complex open-domain questions.
We show that RGPT-QA achieves 2.2%, 2.4%, and 6.3% absolute improvement in Exact Match accuracy on Natural Questions, TriviaQA, and WebQuestions.
arXiv Detail & Related papers (2021-09-21T17:59:31Z) - GooAQ: Open Question Answering with Diverse Answer Types [63.06454855313667]
We present GooAQ, a large-scale dataset with a variety of answer types.
This dataset contains over 5 million questions and 3 million answers collected from Google.
arXiv Detail & Related papers (2021-04-18T05:40:39Z) - OneStop QAMaker: Extract Question-Answer Pairs from Text in a One-Stop
Approach [11.057028572260064]
We propose a model named OneStop to generate QA pairs from documents in a one-stop approach.
Specifically, questions and their corresponding answer span is extracted simultaneously.
OneStop is much more efficient to be trained and deployed in industrial scenarios since it involves only one model to solve the complex QA generation task.
arXiv Detail & Related papers (2021-02-24T08:45:00Z) - Summary-Oriented Question Generation for Informational Queries [23.72999724312676]
We aim to produce self-explanatory questions that focus on main document topics and are answerable with variable length passages as appropriate.
Our model shows SOTA performance of SQ generation on the NQ dataset (20.1 BLEU-4).
We further apply our model on out-of-domain news articles, evaluating with a QA system due to the lack of gold questions and demonstrate that our model produces better SQs for news articles -- with further confirmation via a human evaluation.
arXiv Detail & Related papers (2020-10-19T17:30:08Z) - Tell Me How to Ask Again: Question Data Augmentation with Controllable
Rewriting in Continuous Space [94.8320535537798]
Controllable Rewriting based Question Data Augmentation (CRQDA) for machine reading comprehension (MRC), question generation, and question-answering natural language inference tasks.
We treat the question data augmentation task as a constrained question rewriting problem to generate context-relevant, high-quality, and diverse question data samples.
arXiv Detail & Related papers (2020-10-04T03:13:46Z) - AmbigQA: Answering Ambiguous Open-domain Questions [99.59747941602684]
We introduce AmbigQA, a new open-domain question answering task which involves finding every plausible answer.
To study this task, we construct AmbigNQ, a dataset covering 14,042 questions from NQ-open.
We find that over half of the questions in NQ-open are ambiguous, with diverse sources of ambiguity such as event and entity references.
arXiv Detail & Related papers (2020-04-22T15:42:13Z) - SQuINTing at VQA Models: Introspecting VQA Models with Sub-Questions [66.86887670416193]
We show that state-of-the-art VQA models have comparable performance in answering perception and reasoning questions, but suffer from consistency problems.
To address this shortcoming, we propose an approach called Sub-Question-aware Network Tuning (SQuINT)
We show that SQuINT improves model consistency by 5%, also marginally improving performance on the Reasoning questions in VQA, while also displaying better attention maps.
arXiv Detail & Related papers (2020-01-20T01:02:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.