Related papers: New Vietnamese Corpus for Machine Reading Comprehension of Health News Articles

New Vietnamese Corpus for Machine Reading Comprehension of Health News Articles

URL: http://arxiv.org/abs/2006.11138v2
Date: Thu, 11 Feb 2021 12:50:41 GMT
Title: New Vietnamese Corpus for Machine Reading Comprehension of Health News Articles
Authors: Kiet Van Nguyen, Tin Van Huynh, Duc-Vu Nguyen, Anh Gia-Tuan Nguyen, Ngan Luu-Thuy Nguyen
Abstract summary: This paper presents ViNewsQA as a new corpus for the Vietnamese language to evaluate healthcare reading comprehension models. The corpus comprises 22,057 human-generated question-answer pairs. Our experiments show that the best machine model is ALBERT, which achieves an exact match score of 65.26% and an F1-score of 84.89% on our corpus.
Score: 2.5199066832791535
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Large-scale and high-quality corpora are necessary for evaluating machine reading comprehension models on a low-resource language like Vietnamese. Besides, machine reading comprehension (MRC) for the health domain offers great potential for practical applications; however, there is still very little MRC research in this domain. This paper presents ViNewsQA as a new corpus for the Vietnamese language to evaluate healthcare reading comprehension models. The corpus comprises 22,057 human-generated question-answer pairs. Crowd-workers create the questions and their answers based on a collection of over 4,416 online Vietnamese healthcare news articles, where the answers comprise spans extracted from the corresponding articles. In particular, we develop a process of creating a corpus for the Vietnamese machine reading comprehension. Comprehensive evaluations demonstrate that our corpus requires abilities beyond simple reasoning, such as word matching and demanding difficult reasoning based on single-or-multiple-sentence information. We conduct experiments using different types of machine reading comprehension methods to achieve the first baseline performances, compared with further models' performances. We also measure human performance on the corpus and compared it with several powerful neural network-based and transfer learning-based models. Our experiments show that the best machine model is ALBERT, which achieves an exact match score of 65.26% and an F1-score of 84.89% on our corpus. The significant differences between humans and the best-performance model (14.53% of EM and 10.90% of F1-score) on the test set of our corpus indicate that improvements in ViNewsQA could be explored in the future study. Our corpus is publicly available on our website for the research purpose to encourage the research community to make these improvements.

Related papers

VlogQA: Task, Dataset, and Baseline Models for Vietnamese Spoken-Based Machine Reading Comprehension [1.3942150186842373]
This paper presents the development process of a Vietnamese spoken language corpus for machine reading comprehension tasks. The existing MRC corpora in Vietnamese mainly focus on formal written documents such as Wikipedia articles, online newspapers, or textbooks. In contrast, the VlogQA consists of 10,076 question-answer pairs based on 1,230 transcript documents sourced from YouTube.
arXiv Detail & Related papers (2024-02-05T00:54:40Z)
KINLP at SemEval-2023 Task 12: Kinyarwanda Tweet Sentiment Analysis [1.2183405753834562]
This paper describes the system entered by the author to the SemEval-2023 Task 12: Sentiment analysis for African languages. The system focuses on the Kinyarwanda language and uses a language-specific model.
arXiv Detail & Related papers (2023-04-25T04:30:03Z)
No Language Left Behind: Scaling Human-Centered Machine Translation [69.28110770760506]
We create datasets and models aimed at narrowing the performance gap between low and high-resource languages. We propose multiple architectural and training improvements to counteract overfitting while training on thousands of tasks. Our model achieves an improvement of 44% BLEU relative to the previous state-of-the-art.
arXiv Detail & Related papers (2022-07-11T07:33:36Z)
RuMedBench: A Russian Medical Language Understanding Benchmark [58.99199480170909]
The paper describes the open Russian medical language understanding benchmark covering several task types. We prepare the unified format labeling, data split, and evaluation metrics for new tasks. A single-number metric expresses a model's ability to cope with the benchmark.
arXiv Detail & Related papers (2022-01-17T16:23:33Z)
CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark [51.38557174322772]
We present the first Chinese Biomedical Language Understanding Evaluation benchmark. It is a collection of natural language understanding tasks including named entity recognition, information extraction, clinical diagnosis normalization, single-sentence/sentence-pair classification. We report empirical results with the current 11 pre-trained Chinese models, and experimental results show that state-of-the-art neural models perform by far worse than the human ceiling.
arXiv Detail & Related papers (2021-06-15T12:25:30Z)
Improving Cross-Lingual Reading Comprehension with Self-Training [62.73937175625953]
Current state-of-the-art models even surpass human performance on several benchmarks. Previous works have revealed the abilities of pre-trained multilingual models for zero-shot cross-lingual reading comprehension. This paper further utilized unlabeled data to improve the performance.
arXiv Detail & Related papers (2021-05-08T08:04:30Z)
Conversational Machine Reading Comprehension for Vietnamese Healthcare Texts [0.2446672595462589]
We present a new Vietnamese corpus for conversational machine reading comprehension (UIT-ViCoQA) UIT-ViCoQA consists of 10,000 questions with answers over 2,000 conversations about health news articles. The best model obtains an F1 score of 45.27%, which is 30.91 points behind human performance (76.18%), indicating that there is ample room for improvement.
arXiv Detail & Related papers (2021-05-04T14:50:39Z)
An analysis of full-size Russian complexly NER labelled corpus of Internet user reviews on the drugs based on deep learning and language neural nets [94.37521840642141]
We present the full-size Russian complexly NER-labeled corpus of Internet user reviews. A set of advanced deep learning neural networks is used to extract pharmacologically meaningful entities from Russian texts.
arXiv Detail & Related papers (2021-04-30T19:46:24Z)
An Experimental Study of Deep Neural Network Models for Vietnamese Multiple-Choice Reading Comprehension [2.7528170226206443]
We conduct experiments on neural network-based model to understand the impact of word representation to machine reading comprehension. Our experiments include using the Co-match model on six different Vietnamese word embeddings and the BERT model for multiple-choice reading comprehension. On the ViMMRC corpus, the accuracy of BERT model is 61.28% on test set.
arXiv Detail & Related papers (2020-08-20T07:29:14Z)
Enhancing lexical-based approach with external knowledge for Vietnamese multiple-choice machine reading comprehension [2.5199066832791535]
We construct a dataset which consists of 2,783 pairs of multiple-choice questions and answers based on 417 Vietnamese texts. We propose a lexical-based MRC method that utilizes semantic similarity measures and external knowledge sources to analyze questions and extract answers from the given text. Our proposed method achieves 61.81% by accuracy, which is 5.51% higher than the best baseline model.
arXiv Detail & Related papers (2020-01-16T08:09:51Z)
ORB: An Open Reading Benchmark for Comprehensive Evaluation of Machine Reading Comprehension [53.037401638264235]
We present an evaluation server, ORB, that reports performance on seven diverse reading comprehension datasets. The evaluation server places no restrictions on how models are trained, so it is a suitable test bed for exploring training paradigms and representation learning.
arXiv Detail & Related papers (2019-12-29T07:27:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.