Rapidly Bootstrapping a Question Answering Dataset for COVID-19
- URL: http://arxiv.org/abs/2004.11339v1
- Date: Thu, 23 Apr 2020 17:35:11 GMT
- Title: Rapidly Bootstrapping a Question Answering Dataset for COVID-19
- Authors: Raphael Tang, Rodrigo Nogueira, Edwin Zhang, Nikhil Gupta, Phuong Cam,
Kyunghyun Cho, Jimmy Lin
- Abstract summary: We present CovidQA, the beginnings of a question answering dataset specifically designed for COVID-19.
This is the first publicly available resource of its type, and intended as a stopgap measure for guiding research until more substantial evaluation resources become available.
- Score: 88.86456834766288
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present CovidQA, the beginnings of a question answering dataset
specifically designed for COVID-19, built by hand from knowledge gathered from
Kaggle's COVID-19 Open Research Dataset Challenge. To our knowledge, this is
the first publicly available resource of its type, and intended as a stopgap
measure for guiding research until more substantial evaluation resources become
available. While this dataset, comprising 124 question-article pairs as of the
present version 0.1 release, does not have sufficient examples for supervised
machine learning, we believe that it can be helpful for evaluating the
zero-shot or transfer capabilities of existing models on topics specifically
related to COVID-19. This paper describes our methodology for constructing the
dataset and presents the effectiveness of a number of baselines, including
term-based techniques and various transformer-based models. The dataset is
available at http://covidqa.ai/
Related papers
- UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models [55.22048505787125]
This paper contributes a comprehensive dataset, called UNK-VQA.
We first augment the existing data via deliberate perturbations on either the image or question.
We then extensively evaluate the zero- and few-shot performance of several emerging multi-modal large models.
arXiv Detail & Related papers (2023-10-17T02:38:09Z) - Encyclopedic VQA: Visual questions about detailed properties of
fine-grained categories [41.2406955639537]
Encyclopedic-VQA is a large scale visual question answering dataset.
It contains 221k unique question+answer pairs each matched with (up to) 5 images.
Our dataset comes with a controlled knowledge base derived from Wikipedia.
arXiv Detail & Related papers (2023-06-15T16:03:01Z) - Going beyond research datasets: Novel intent discovery in the industry
setting [60.90117614762879]
This paper proposes methods to improve the intent discovery pipeline deployed in a large e-commerce platform.
We show the benefit of pre-training language models on in-domain data: both self-supervised and with weak supervision.
We also devise the best method to utilize the conversational structure (i.e., question and answer) of real-life datasets during fine-tuning for clustering tasks, which we call Conv.
arXiv Detail & Related papers (2023-05-09T14:21:29Z) - DC-BENCH: Dataset Condensation Benchmark [79.18718490863908]
This work provides the first large-scale standardized benchmark on dataset condensation.
It consists of a suite of evaluations to comprehensively reflect the generability and effectiveness of condensation methods.
The benchmark library is open-sourced to facilitate future research and application.
arXiv Detail & Related papers (2022-07-20T03:54:05Z) - COVIDRead: A Large-scale Question Answering Dataset on COVID-19 [41.23094507923245]
We present a very important resource, COVIDRead, a Stanford Question Answering dataset (SQuAD) like dataset over more than 100k question-answer pairs.
This is a precious resource that could serve many purposes, ranging from common people queries regarding this very uncommon disease to managing articles by editors/associate editors of a journal.
We establish several end-to-end neural network based baseline models that attain the lowest F1 of 32.03% and the highest F1 of 37.19%.
arXiv Detail & Related papers (2021-10-05T07:38:06Z) - ClarQ: A large-scale and diverse dataset for Clarification Question
Generation [67.1162903046619]
We devise a novel bootstrapping framework that assists in the creation of a diverse, large-scale dataset of clarification questions based on postcomments extracted from stackexchange.
We quantitatively demonstrate the utility of the newly created dataset by applying it to the downstream task of question-answering.
We release this dataset in order to foster research into the field of clarification question generation with the larger goal of enhancing dialog and question answering systems.
arXiv Detail & Related papers (2020-06-10T17:56:50Z) - Harvesting and Refining Question-Answer Pairs for Unsupervised QA [95.9105154311491]
We introduce two approaches to improve unsupervised Question Answering (QA)
First, we harvest lexically and syntactically divergent questions from Wikipedia to automatically construct a corpus of question-answer pairs (named as RefQA)
Second, we take advantage of the QA model to extract more appropriate answers, which iteratively refines data over RefQA.
arXiv Detail & Related papers (2020-05-06T15:56:06Z) - What do Models Learn from Question Answering Datasets? [2.28438857884398]
We investigate if models are learning reading comprehension from question answering datasets.
We evaluate models on their generalizability to out-of-domain examples, responses to missing or incorrect data, and ability to handle question variations.
We make recommendations for building future QA datasets that better evaluate the task of question answering through reading comprehension.
arXiv Detail & Related papers (2020-04-07T15:41:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.