What Are People Asking About COVID-19? A Question Classification Dataset
- URL: http://arxiv.org/abs/2005.12522v3
- Date: Fri, 8 Sep 2023 21:44:52 GMT
- Title: What Are People Asking About COVID-19? A Question Classification Dataset
- Authors: Jerry Wei, Chengyu Huang, Soroush Vosoughi, Jason Wei
- Abstract summary: We present COVID-Q, a set of 1,690 questions about COVID-19 from 13 sources.
The most common questions in our dataset asked about transmission, prevention, and societal effects of COVID.
Many questions that appeared in multiple sources were not answered by any FAQ websites of reputable organizations such as the CDC and FDA.
- Score: 56.609360198598914
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present COVID-Q, a set of 1,690 questions about COVID-19 from 13 sources,
which we annotate into 15 question categories and 207 question clusters. The
most common questions in our dataset asked about transmission, prevention, and
societal effects of COVID, and we found that many questions that appeared in
multiple sources were not answered by any FAQ websites of reputable
organizations such as the CDC and FDA. We post our dataset publicly at
https://github.com/JerryWeiAI/COVID-Q. For classifying questions into 15
categories, a BERT baseline scored 58.1% accuracy when trained on 20 examples
per category, and for a question clustering task, a BERT + triplet loss
baseline achieved 49.5% accuracy. We hope COVID-Q can help either for direct
use in developing applied systems or as a domain-specific resource for model
evaluation.
Related papers
- UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models [55.22048505787125]
This paper contributes a comprehensive dataset, called UNK-VQA.
We first augment the existing data via deliberate perturbations on either the image or question.
We then extensively evaluate the zero- and few-shot performance of several emerging multi-modal large models.
arXiv Detail & Related papers (2023-10-17T02:38:09Z) - CREPE: Open-Domain Question Answering with False Presuppositions [92.20501870319765]
We introduce CREPE, a QA dataset containing a natural distribution of presupposition failures from online information-seeking forums.
We find that 25% of questions contain false presuppositions, and provide annotations for these presuppositions and their corrections.
We show that adaptations of existing open-domain QA models can find presuppositions moderately well, but struggle when predicting whether a presupposition is factually correct.
arXiv Detail & Related papers (2022-11-30T18:54:49Z) - RxWhyQA: a clinical question-answering dataset with the challenge of
multi-answer questions [4.017119245460155]
We create a dataset for the development and evaluation of clinical question-answering systems that can handle multi-answer questions.
The 1-to-0 and 1-to-N drug-reason relations formed the unanswerable and multi-answer entries.
arXiv Detail & Related papers (2022-01-07T15:58:58Z) - PerCQA: Persian Community Question Answering Dataset [2.503043323723241]
Community Question Answering (CQA) forums provide answers for many real-life questions.
We present PerCQA, the first Persian dataset for CQA.
This dataset contains the questions and answers crawled from the most well-known Persian forum.
arXiv Detail & Related papers (2021-12-25T14:06:41Z) - COVIDRead: A Large-scale Question Answering Dataset on COVID-19 [41.23094507923245]
We present a very important resource, COVIDRead, a Stanford Question Answering dataset (SQuAD) like dataset over more than 100k question-answer pairs.
This is a precious resource that could serve many purposes, ranging from common people queries regarding this very uncommon disease to managing articles by editors/associate editors of a journal.
We establish several end-to-end neural network based baseline models that attain the lowest F1 of 32.03% and the highest F1 of 37.19%.
arXiv Detail & Related papers (2021-10-05T07:38:06Z) - Relation-Guided Pre-Training for Open-Domain Question Answering [67.86958978322188]
We propose a Relation-Guided Pre-Training (RGPT-QA) framework to solve complex open-domain questions.
We show that RGPT-QA achieves 2.2%, 2.4%, and 6.3% absolute improvement in Exact Match accuracy on Natural Questions, TriviaQA, and WebQuestions.
arXiv Detail & Related papers (2021-09-21T17:59:31Z) - SituatedQA: Incorporating Extra-Linguistic Contexts into QA [7.495151447459443]
We introduce SituatedQA, an open-retrieval QA dataset where systems must produce the correct answer to a question given the temporal or geographical context.
We find that a significant proportion of information seeking questions have context-dependent answers.
Our study shows that existing models struggle with producing answers that are frequently updated or from uncommon locations.
arXiv Detail & Related papers (2021-09-13T17:53:21Z) - GooAQ: Open Question Answering with Diverse Answer Types [63.06454855313667]
We present GooAQ, a large-scale dataset with a variety of answer types.
This dataset contains over 5 million questions and 3 million answers collected from Google.
arXiv Detail & Related papers (2021-04-18T05:40:39Z) - Transformer-Based Models for Question Answering on COVID19 [4.631723879329972]
We propose three transformer-based question-answering systems using BERT, ALBERT, and T5 models.
The BERT-based QA system achieved the highest F1 score (26.32), while the ALBERT-based QA system achieved the highest Exact Match (13.04)
arXiv Detail & Related papers (2021-01-16T23:06:30Z) - Rapidly Bootstrapping a Question Answering Dataset for COVID-19 [88.86456834766288]
We present CovidQA, the beginnings of a question answering dataset specifically designed for COVID-19.
This is the first publicly available resource of its type, and intended as a stopgap measure for guiding research until more substantial evaluation resources become available.
arXiv Detail & Related papers (2020-04-23T17:35:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.