Unsupervised multiple choices question answering via universal corpus
- URL: http://arxiv.org/abs/2402.17333v1
- Date: Tue, 27 Feb 2024 09:10:28 GMT
- Title: Unsupervised multiple choices question answering via universal corpus
- Authors: Qin Zhang, Hao Ge, Xiaojun Chen, Meng Fang
- Abstract summary: We propose a novel framework designed to generate synthetic multiple choice question answering (MCQA) data.
We leverage both named entities (NE) and knowledge graphs to discover plausible distractors to form complete synthetic samples.
- Score: 27.78825771434918
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unsupervised question answering is a promising yet challenging task, which
alleviates the burden of building large-scale annotated data in a new domain.
It motivates us to study the unsupervised multiple-choice question answering
(MCQA) problem. In this paper, we propose a novel framework designed to
generate synthetic MCQA data barely based on contexts from the universal domain
without relying on any form of manual annotation. Possible answers are
extracted and used to produce related questions, then we leverage both named
entities (NE) and knowledge graphs to discover plausible distractors to form
complete synthetic samples. Experiments on multiple MCQA datasets demonstrate
the effectiveness of our method.
Related papers
- Chain-of-Discussion: A Multi-Model Framework for Complex Evidence-Based Question Answering [55.295699268654545]
We propose a novel Chain-of-Discussion framework to leverage the synergy among open-source Large Language Models.
Our experiments show that discussions among multiple LLMs play a vital role in enhancing the quality of answers.
arXiv Detail & Related papers (2024-02-26T05:31:34Z) - SEMQA: Semi-Extractive Multi-Source Question Answering [94.04430035121136]
We introduce a new QA task for answering multi-answer questions by summarizing multiple diverse sources in a semi-extractive fashion.
We create the first dataset of this kind, QuoteSum, with human-written semi-extractive answers to natural and generated questions.
arXiv Detail & Related papers (2023-11-08T18:46:32Z) - LIQUID: A Framework for List Question Answering Dataset Generation [17.86721740779611]
We propose LIQUID, an automated framework for generating list QA datasets from unlabeled corpora.
We first convert a passage from Wikipedia or PubMed into a summary and extract named entities from the summarized text as candidate answers.
We then create questions using an off-the-shelf question generator with the extracted entities and original passage.
Using our synthetic data, we significantly improve the performance of the previous best list QA models by exact-match F1 scores of 5.0 on MultiSpanQA, 1.9 on Quoref, and 2.8 averaged across three BioASQ benchmarks.
arXiv Detail & Related papers (2023-02-03T12:42:45Z) - Successive Prompting for Decomposing Complex Questions [50.00659445976735]
Recent works leverage the capabilities of large language models (LMs) to perform complex question answering in a few-shot setting.
We introduce Successive Prompting'', where we iteratively break down a complex task into a simple task, solve it, and then repeat the process until we get the final solution.
Our best model (with successive prompting) achieves an improvement of 5% absolute F1 on a few-shot version of the DROP dataset.
arXiv Detail & Related papers (2022-12-08T06:03:38Z) - Multi-Type Conversational Question-Answer Generation with Closed-ended
and Unanswerable Questions [3.6825890616838066]
Conversational question answering (CQA) facilitates an incremental and interactive understanding of a given context.
We introduce a novel method to synthesize data for CQA with various question types, including open-ended, closed-ended, and unanswerable questions.
Across four domains, CQA systems trained on our synthetic data indeed show good performance close to the systems trained on human-annotated data.
arXiv Detail & Related papers (2022-10-24T07:01:51Z) - DecAF: Joint Decoding of Answers and Logical Forms for Question
Answering over Knowledge Bases [81.19499764899359]
We propose a novel framework DecAF that jointly generates both logical forms and direct answers.
DecAF achieves new state-of-the-art accuracy on WebQSP, FreebaseQA, and GrailQA benchmarks.
arXiv Detail & Related papers (2022-09-30T19:51:52Z) - Mixture of Experts for Biomedical Question Answering [34.92691831878302]
We propose a Mixture-of-Expert (MoE) based question answering method called MoEBQA.
MoEBQA decouples the computation for different types of questions by sparse routing.
We evaluate MoEBQA on three Biomedical Question Answering (BQA) datasets constructed based on real examinations.
arXiv Detail & Related papers (2022-04-15T14:11:40Z) - AnswerSumm: A Manually-Curated Dataset and Pipeline for Answer
Summarization [73.91543616777064]
Community Question Answering (CQA) fora such as Stack Overflow and Yahoo! Answers contain a rich resource of answers to a wide range of community-based questions.
One goal of answer summarization is to produce a summary that reflects the range of answer perspectives.
This work introduces a novel dataset of 4,631 CQA threads for answer summarization, curated by professional linguists.
arXiv Detail & Related papers (2021-11-11T21:48:02Z) - Multi-Perspective Abstractive Answer Summarization [76.10437565615138]
Community Question Answering forums contain a rich resource of answers to a wide range of questions.
The goal of multi-perspective answer summarization is to produce a summary that includes all perspectives of the answer.
This work introduces a novel dataset creation method to automatically create multi-perspective, bullet-point abstractive summaries.
arXiv Detail & Related papers (2021-04-17T13:15:29Z) - Answering Ambiguous Questions through Generative Evidence Fusion and
Round-Trip Prediction [46.38201136570501]
We present a model that aggregates and combines evidence from multiple passages to adaptively predict a single answer or a set of question-answer pairs for ambiguous questions.
Our model, named Refuel, achieves a new state-of-the-art performance on the AmbigQA dataset, and shows competitive performance on NQ-Open and TriviaQA.
arXiv Detail & Related papers (2020-11-26T05:48:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.