UIT-ViCoV19QA: A Dataset for COVID-19 Community-based Question Answering
on Vietnamese Language
- URL: http://arxiv.org/abs/2209.06668v1
- Date: Wed, 14 Sep 2022 14:24:23 GMT
- Title: UIT-ViCoV19QA: A Dataset for COVID-19 Community-based Question Answering
on Vietnamese Language
- Authors: Triet Minh Thai, Ngan Ha-Thao Chu, Anh Tuan Vo, Son T. Luu
- Abstract summary: We present the first Vietnamese community-based question answering dataset for developing question answering systems for COVID-19 called UIT-ViCoV19QA.
The dataset comprises 4,500 question-answer pairs collected from trusted medical sources, with at least one answer and at most four unique paraphrased answers per question.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: For the last two years, from 2020 to 2021, COVID-19 has broken disease
prevention measures in many countries, including Vietnam, and negatively
impacted various aspects of human life and the social community. Besides, the
misleading information in the community and fake news about the pandemic are
also serious situations. Therefore, we present the first Vietnamese
community-based question answering dataset for developing question answering
systems for COVID-19 called UIT-ViCoV19QA. The dataset comprises 4,500
question-answer pairs collected from trusted medical sources, with at least one
answer and at most four unique paraphrased answers per question. Along with the
dataset, we set up various deep learning models as baseline to assess the
quality of our dataset and initiate the benchmark results for further research
through commonly used metrics such as BLEU, METEOR, and ROUGE-L. We also
illustrate the positive effects of having multiple paraphrased answers
experimented on these models, especially on Transformer - a dominant
architecture in the field of study.
Related papers
- Generative Pre-trained Transformer for Vietnamese Community-based
COVID-19 Question Answering [0.0]
Generative Pre-trained Transformer (GPT) has been effectively employed as a decoder within state-of-the-art (SOTA) question answering systems.
This paper presents an implementation of GPT-2 for community-based question answering specifically focused on COVID-19 related queries in Vietnamese.
arXiv Detail & Related papers (2023-10-23T06:14:07Z) - SPBERTQA: A Two-Stage Question Answering System Based on Sentence
Transformers for Medical Texts [2.5199066832791535]
This paper proposes a two-stage QA system based on Sentence-BERT (SBERT) using multiple negatives ranking (MNR) loss combined with BM25.
With the obtained results, this system achieves better performance than traditional methods.
arXiv Detail & Related papers (2022-06-20T07:07:59Z) - COVID-19 Named Entity Recognition for Vietnamese [6.17059264011429]
We present the first manually-annotated COVID-19 domain-specific dataset for Vietnamese.
Our dataset is annotated for the named entity recognition task with newly-defined entity types.
Our dataset also contains the largest number of entities compared to existing Vietnamese NER datasets.
arXiv Detail & Related papers (2021-04-08T16:35:34Z) - A Vietnamese Dataset for Evaluating Machine Reading Comprehension [2.7528170226206443]
We present UIT-ViQuAD, a new dataset for the low-resource language as Vietnamese to evaluate machine reading comprehension models.
This dataset comprises over 23,000 human-generated question-answer pairs based on 5,109 passages of 174 Vietnamese articles from Wikipedia.
We conduct experiments on state-of-the-art MRC methods for English and Chinese as the first experimental models on UIT-ViQuAD.
arXiv Detail & Related papers (2020-09-30T15:06:56Z) - Understanding the temporal evolution of COVID-19 research through
machine learning and natural language processing [66.63200823918429]
The outbreak of the novel coronavirus disease 2019 (COVID-19), caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has been continuously affecting human lives and communities around the world.
We used multiple data sources, i.e., PubMed and ArXiv, and built several machine learning models to characterize the landscape of current COVID-19 research.
Our findings confirm the types of research available in PubMed and ArXiv differ significantly, with the former exhibiting greater diversity in terms of COVID-19 related issues.
arXiv Detail & Related papers (2020-07-22T18:02:39Z) - A Qualitative Evaluation of Language Models on Automatic
Question-Answering for COVID-19 [4.676651062800037]
COVID-19 has caused more than 7.4 million cases and over 418,000 deaths.
Online communities, forums, and social media provide potential venues to search for relevant questions and answers.
We propose to apply a language model for automatically answering questions related to COVID-19 and qualitatively evaluate the generated responses.
arXiv Detail & Related papers (2020-06-19T05:13:57Z) - CO-Search: COVID-19 Information Retrieval with Semantic Search, Question
Answering, and Abstractive Summarization [53.67205506042232]
CO-Search is a retriever-ranker semantic search engine designed to handle complex queries over the COVID-19 literature.
To account for the domain-specific and relatively limited dataset, we generate a bipartite graph of document paragraphs and citations.
We evaluate our system on the data of the TREC-COVID information retrieval challenge.
arXiv Detail & Related papers (2020-06-17T01:32:48Z) - Cross-lingual Transfer Learning for COVID-19 Outbreak Alignment [90.12602012910465]
We train on Italy's early COVID-19 outbreak through Twitter and transfer to several other countries.
Our experiments show strong results with up to 0.85 Spearman correlation in cross-country predictions.
arXiv Detail & Related papers (2020-06-05T02:04:25Z) - What Are People Asking About COVID-19? A Question Classification Dataset [56.609360198598914]
We present COVID-Q, a set of 1,690 questions about COVID-19 from 13 sources.
The most common questions in our dataset asked about transmission, prevention, and societal effects of COVID.
Many questions that appeared in multiple sources were not answered by any FAQ websites of reputable organizations such as the CDC and FDA.
arXiv Detail & Related papers (2020-05-26T05:41:58Z) - Rapidly Bootstrapping a Question Answering Dataset for COVID-19 [88.86456834766288]
We present CovidQA, the beginnings of a question answering dataset specifically designed for COVID-19.
This is the first publicly available resource of its type, and intended as a stopgap measure for guiding research until more substantial evaluation resources become available.
arXiv Detail & Related papers (2020-04-23T17:35:11Z) - Mapping the Landscape of Artificial Intelligence Applications against
COVID-19 [59.30734371401316]
COVID-19, the disease caused by the SARS-CoV-2 virus, has been declared a pandemic by the World Health Organization.
We present an overview of recent studies using Machine Learning and, more broadly, Artificial Intelligence to tackle many aspects of the COVID-19 crisis.
arXiv Detail & Related papers (2020-03-25T12:30:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.