COVIDRead: A Large-scale Question Answering Dataset on COVID-19
- URL: http://arxiv.org/abs/2110.09321v1
- Date: Tue, 5 Oct 2021 07:38:06 GMT
- Title: COVIDRead: A Large-scale Question Answering Dataset on COVID-19
- Authors: Tanik Saikh, Sovan Kumar Sahoo, Asif Ekbal, Pushpak Bhattacharyya
- Abstract summary: We present a very important resource, COVIDRead, a Stanford Question Answering dataset (SQuAD) like dataset over more than 100k question-answer pairs.
This is a precious resource that could serve many purposes, ranging from common people queries regarding this very uncommon disease to managing articles by editors/associate editors of a journal.
We establish several end-to-end neural network based baseline models that attain the lowest F1 of 32.03% and the highest F1 of 37.19%.
- Score: 41.23094507923245
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: During this pandemic situation, extracting any relevant information related
to COVID-19 will be immensely beneficial to the community at large. In this
paper, we present a very important resource, COVIDRead, a Stanford Question
Answering Dataset (SQuAD) like dataset over more than 100k question-answer
pairs. The dataset consists of Context-Answer-Question triples. Primarily the
questions from the context are constructed in an automated way. After that, the
system-generated questions are manually checked by hu-mans annotators. This is
a precious resource that could serve many purposes, ranging from common people
queries regarding this very uncommon disease to managing articles by
editors/associate editors of a journal. We establish several end-to-end neural
network based baseline models that attain the lowest F1 of 32.03% and the
highest F1 of 37.19%. To the best of our knowledge, we are the first to provide
this kind of QA dataset in such a large volume on COVID-19. This dataset
creates a new avenue of carrying out research on COVID-19 by providing a
benchmark dataset and a baseline model.
Related papers
- UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models [55.22048505787125]
This paper contributes a comprehensive dataset, called UNK-VQA.
We first augment the existing data via deliberate perturbations on either the image or question.
We then extensively evaluate the zero- and few-shot performance of several emerging multi-modal large models.
arXiv Detail & Related papers (2023-10-17T02:38:09Z) - Encyclopedic VQA: Visual questions about detailed properties of
fine-grained categories [41.2406955639537]
Encyclopedic-VQA is a large scale visual question answering dataset.
It contains 221k unique question+answer pairs each matched with (up to) 5 images.
Our dataset comes with a controlled knowledge base derived from Wikipedia.
arXiv Detail & Related papers (2023-06-15T16:03:01Z) - IIRC: A Dataset of Incomplete Information Reading Comprehension
Questions [53.3193258414806]
We present a dataset, IIRC, with more than 13K questions over paragraphs from English Wikipedia.
The questions were written by crowd workers who did not have access to any of the linked documents.
We follow recent modeling work on various reading comprehension datasets to construct a baseline model for this dataset.
arXiv Detail & Related papers (2020-11-13T20:59:21Z) - ClarQ: A large-scale and diverse dataset for Clarification Question
Generation [67.1162903046619]
We devise a novel bootstrapping framework that assists in the creation of a diverse, large-scale dataset of clarification questions based on postcomments extracted from stackexchange.
We quantitatively demonstrate the utility of the newly created dataset by applying it to the downstream task of question-answering.
We release this dataset in order to foster research into the field of clarification question generation with the larger goal of enhancing dialog and question answering systems.
arXiv Detail & Related papers (2020-06-10T17:56:50Z) - What Are People Asking About COVID-19? A Question Classification Dataset [56.609360198598914]
We present COVID-Q, a set of 1,690 questions about COVID-19 from 13 sources.
The most common questions in our dataset asked about transmission, prevention, and societal effects of COVID.
Many questions that appeared in multiple sources were not answered by any FAQ websites of reputable organizations such as the CDC and FDA.
arXiv Detail & Related papers (2020-05-26T05:41:58Z) - Rapidly Bootstrapping a Question Answering Dataset for COVID-19 [88.86456834766288]
We present CovidQA, the beginnings of a question answering dataset specifically designed for COVID-19.
This is the first publicly available resource of its type, and intended as a stopgap measure for guiding research until more substantial evaluation resources become available.
arXiv Detail & Related papers (2020-04-23T17:35:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.