New Vietnamese Corpus for Machine Reading Comprehension of Health News
Articles
- URL: http://arxiv.org/abs/2006.11138v2
- Date: Thu, 11 Feb 2021 12:50:41 GMT
- Title: New Vietnamese Corpus for Machine Reading Comprehension of Health News
Articles
- Authors: Kiet Van Nguyen, Tin Van Huynh, Duc-Vu Nguyen, Anh Gia-Tuan Nguyen,
Ngan Luu-Thuy Nguyen
- Abstract summary: This paper presents ViNewsQA as a new corpus for the Vietnamese language to evaluate healthcare reading comprehension models.
The corpus comprises 22,057 human-generated question-answer pairs.
Our experiments show that the best machine model is ALBERT, which achieves an exact match score of 65.26% and an F1-score of 84.89% on our corpus.
- Score: 2.5199066832791535
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Large-scale and high-quality corpora are necessary for evaluating machine
reading comprehension models on a low-resource language like Vietnamese.
Besides, machine reading comprehension (MRC) for the health domain offers great
potential for practical applications; however, there is still very little MRC
research in this domain. This paper presents ViNewsQA as a new corpus for the
Vietnamese language to evaluate healthcare reading comprehension models. The
corpus comprises 22,057 human-generated question-answer pairs. Crowd-workers
create the questions and their answers based on a collection of over 4,416
online Vietnamese healthcare news articles, where the answers comprise spans
extracted from the corresponding articles. In particular, we develop a process
of creating a corpus for the Vietnamese machine reading comprehension.
Comprehensive evaluations demonstrate that our corpus requires abilities beyond
simple reasoning, such as word matching and demanding difficult reasoning based
on single-or-multiple-sentence information. We conduct experiments using
different types of machine reading comprehension methods to achieve the first
baseline performances, compared with further models' performances. We also
measure human performance on the corpus and compared it with several powerful
neural network-based and transfer learning-based models. Our experiments show
that the best machine model is ALBERT, which achieves an exact match score of
65.26% and an F1-score of 84.89% on our corpus. The significant differences
between humans and the best-performance model (14.53% of EM and 10.90% of
F1-score) on the test set of our corpus indicate that improvements in ViNewsQA
could be explored in the future study. Our corpus is publicly available on our
website for the research purpose to encourage the research community to make
these improvements.
Related papers
- VlogQA: Task, Dataset, and Baseline Models for Vietnamese Spoken-Based Machine Reading Comprehension [1.3942150186842373]
This paper presents the development process of a Vietnamese spoken language corpus for machine reading comprehension tasks.
The existing MRC corpora in Vietnamese mainly focus on formal written documents such as Wikipedia articles, online newspapers, or textbooks.
In contrast, the VlogQA consists of 10,076 question-answer pairs based on 1,230 transcript documents sourced from YouTube.
arXiv Detail & Related papers (2024-02-05T00:54:40Z) - KINLP at SemEval-2023 Task 12: Kinyarwanda Tweet Sentiment Analysis [1.2183405753834562]
This paper describes the system entered by the author to the SemEval-2023 Task 12: Sentiment analysis for African languages.
The system focuses on the Kinyarwanda language and uses a language-specific model.
arXiv Detail & Related papers (2023-04-25T04:30:03Z) - No Language Left Behind: Scaling Human-Centered Machine Translation [69.28110770760506]
We create datasets and models aimed at narrowing the performance gap between low and high-resource languages.
We propose multiple architectural and training improvements to counteract overfitting while training on thousands of tasks.
Our model achieves an improvement of 44% BLEU relative to the previous state-of-the-art.
arXiv Detail & Related papers (2022-07-11T07:33:36Z) - RuMedBench: A Russian Medical Language Understanding Benchmark [58.99199480170909]
The paper describes the open Russian medical language understanding benchmark covering several task types.
We prepare the unified format labeling, data split, and evaluation metrics for new tasks.
A single-number metric expresses a model's ability to cope with the benchmark.
arXiv Detail & Related papers (2022-01-17T16:23:33Z) - CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark [51.38557174322772]
We present the first Chinese Biomedical Language Understanding Evaluation benchmark.
It is a collection of natural language understanding tasks including named entity recognition, information extraction, clinical diagnosis normalization, single-sentence/sentence-pair classification.
We report empirical results with the current 11 pre-trained Chinese models, and experimental results show that state-of-the-art neural models perform by far worse than the human ceiling.
arXiv Detail & Related papers (2021-06-15T12:25:30Z) - Improving Cross-Lingual Reading Comprehension with Self-Training [62.73937175625953]
Current state-of-the-art models even surpass human performance on several benchmarks.
Previous works have revealed the abilities of pre-trained multilingual models for zero-shot cross-lingual reading comprehension.
This paper further utilized unlabeled data to improve the performance.
arXiv Detail & Related papers (2021-05-08T08:04:30Z) - Conversational Machine Reading Comprehension for Vietnamese Healthcare
Texts [0.2446672595462589]
We present a new Vietnamese corpus for conversational machine reading comprehension (UIT-ViCoQA)
UIT-ViCoQA consists of 10,000 questions with answers over 2,000 conversations about health news articles.
The best model obtains an F1 score of 45.27%, which is 30.91 points behind human performance (76.18%), indicating that there is ample room for improvement.
arXiv Detail & Related papers (2021-05-04T14:50:39Z) - An analysis of full-size Russian complexly NER labelled corpus of
Internet user reviews on the drugs based on deep learning and language neural
nets [94.37521840642141]
We present the full-size Russian complexly NER-labeled corpus of Internet user reviews.
A set of advanced deep learning neural networks is used to extract pharmacologically meaningful entities from Russian texts.
arXiv Detail & Related papers (2021-04-30T19:46:24Z) - An Experimental Study of Deep Neural Network Models for Vietnamese
Multiple-Choice Reading Comprehension [2.7528170226206443]
We conduct experiments on neural network-based model to understand the impact of word representation to machine reading comprehension.
Our experiments include using the Co-match model on six different Vietnamese word embeddings and the BERT model for multiple-choice reading comprehension.
On the ViMMRC corpus, the accuracy of BERT model is 61.28% on test set.
arXiv Detail & Related papers (2020-08-20T07:29:14Z) - Enhancing lexical-based approach with external knowledge for Vietnamese
multiple-choice machine reading comprehension [2.5199066832791535]
We construct a dataset which consists of 2,783 pairs of multiple-choice questions and answers based on 417 Vietnamese texts.
We propose a lexical-based MRC method that utilizes semantic similarity measures and external knowledge sources to analyze questions and extract answers from the given text.
Our proposed method achieves 61.81% by accuracy, which is 5.51% higher than the best baseline model.
arXiv Detail & Related papers (2020-01-16T08:09:51Z) - ORB: An Open Reading Benchmark for Comprehensive Evaluation of Machine
Reading Comprehension [53.037401638264235]
We present an evaluation server, ORB, that reports performance on seven diverse reading comprehension datasets.
The evaluation server places no restrictions on how models are trained, so it is a suitable test bed for exploring training paradigms and representation learning.
arXiv Detail & Related papers (2019-12-29T07:27:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.