A Multiple Choices Reading Comprehension Corpus for Vietnamese Language
Education
- URL: http://arxiv.org/abs/2303.18162v1
- Date: Fri, 31 Mar 2023 15:54:54 GMT
- Title: A Multiple Choices Reading Comprehension Corpus for Vietnamese Language
Education
- Authors: Son T. Luu, Khoi Trong Hoang, Tuong Quang Pham, Kiet Van Nguyen, Ngan
Luu-Thuy Nguyen
- Abstract summary: ViMMRC 2.0 is an extension of the previous ViMMRC for the task of multiple-choice reading comprehension in Vietnamese Textbooks.
This dataset has 699 reading passages which are prose and poems, and 5,273 questions.
Our multi-stage models achieved 58.81% by Accuracy on the test set, which is 5.34% better than the highest BERTology models.
- Score: 2.5199066832791535
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Machine reading comprehension has been an interesting and challenging task in
recent years, with the purpose of extracting useful information from texts. To
attain the computer ability to understand the reading text and answer relevant
information, we introduce ViMMRC 2.0 - an extension of the previous ViMMRC for
the task of multiple-choice reading comprehension in Vietnamese Textbooks which
contain the reading articles for students from Grade 1 to Grade 12. This
dataset has 699 reading passages which are prose and poems, and 5,273
questions. The questions in the new dataset are not fixed with four options as
in the previous version. Moreover, the difficulty of questions is increased,
which challenges the models to find the correct choice. The computer must
understand the whole context of the reading passage, the question, and the
content of each choice to extract the right answers. Hence, we propose the
multi-stage approach that combines the multi-step attention network (MAN) with
the natural language inference (NLI) task to enhance the performance of the
reading comprehension model. Then, we compare the proposed methodology with the
baseline BERTology models on the new dataset and the ViMMRC 1.0. Our
multi-stage models achieved 58.81% by Accuracy on the test set, which is 5.34%
better than the highest BERTology models. From the results of the error
analysis, we found the challenge of the reading comprehension models is
understanding the implicit context in texts and linking them together in order
to find the correct answers. Finally, we hope our new dataset will motivate
further research in enhancing the language understanding ability of computers
in the Vietnamese language.
Related papers
- A Novel Cartography-Based Curriculum Learning Method Applied on RoNLI: The First Romanian Natural Language Inference Corpus [71.77214818319054]
Natural language inference is a proxy for natural language understanding.
There is no publicly available NLI corpus for the Romanian language.
We introduce the first Romanian NLI corpus (RoNLI) comprising 58K training sentence pairs.
arXiv Detail & Related papers (2024-05-20T08:41:15Z) - Evaluating the Symbol Binding Ability of Large Language Models for
Multiple-Choice Questions in Vietnamese General Education [0.16317061277457]
We evaluate the ability of large language models (LLMs) to perform multiple choice symbol binding (MCSB) for multiple choice question answering (MCQA) tasks in zero-shot, one-shot, and few-shot settings.
This dataset can be used to evaluate the MCSB ability of LLMs and smaller language models (LMs) because it is typed in a strict style.
arXiv Detail & Related papers (2023-10-18T15:48:07Z) - Ensemble Transfer Learning for Multilingual Coreference Resolution [60.409789753164944]
A problem that frequently occurs when working with a non-English language is the scarcity of annotated training data.
We design a simple but effective ensemble-based framework that combines various transfer learning techniques.
We also propose a low-cost TL method that bootstraps coreference resolution models by utilizing Wikipedia anchor texts.
arXiv Detail & Related papers (2023-01-22T18:22:55Z) - Sentence Extraction-Based Machine Reading Comprehension for Vietnamese [0.2446672595462589]
We introduce the UIT-ViWikiQA, the first dataset for evaluating sentence extraction-based machine reading comprehension in Vietnamese language.
The dataset consists of comprises 23.074 question-answers based on 5.109 passages of 174 Vietnamese articles from Wikipedia.
Our experiments show that the best machine model is XLM-R$_Large, which achieves an exact match (EM) score of 85.97% and an F1-score of 88.77% on our dataset.
arXiv Detail & Related papers (2021-05-19T10:22:27Z) - Conversational Machine Reading Comprehension for Vietnamese Healthcare
Texts [0.2446672595462589]
We present a new Vietnamese corpus for conversational machine reading comprehension (UIT-ViCoQA)
UIT-ViCoQA consists of 10,000 questions with answers over 2,000 conversations about health news articles.
The best model obtains an F1 score of 45.27%, which is 30.91 points behind human performance (76.18%), indicating that there is ample room for improvement.
arXiv Detail & Related papers (2021-05-04T14:50:39Z) - MOCHA: A Dataset for Training and Evaluating Generative Reading
Comprehension Metrics [55.85042753772513]
We introduce a benchmark for training and evaluating generative reading comprehension metrics: MOdeling Correctness with Human.
s.
Using MOCHA, we train a Learned Evaluation metric for Reading Pearson, LERC, to mimic human judgement scores. LERC outperforms baseline metrics by 10 to 36 absolute points on held-out annotations.
When we evaluate on minimal pairs, LERC achieves 80% accuracy, outperforming baselines by 14 to 26 absolute percentage points while leaving significant room for improvement.
arXiv Detail & Related papers (2020-10-07T20:22:54Z) - Inquisitive Question Generation for High Level Text Comprehension [60.21497846332531]
We introduce INQUISITIVE, a dataset of 19K questions that are elicited while a person is reading through a document.
We show that readers engage in a series of pragmatic strategies to seek information.
We evaluate question generation models based on GPT-2 and show that our model is able to generate reasonable questions.
arXiv Detail & Related papers (2020-10-04T19:03:39Z) - Abstractive Summarization of Spoken and Written Instructions with BERT [66.14755043607776]
We present the first application of the BERTSum model to conversational language.
We generate abstractive summaries of narrated instructional videos across a wide variety of topics.
We envision this integrated as a feature in intelligent virtual assistants, enabling them to summarize both written and spoken instructional content upon request.
arXiv Detail & Related papers (2020-08-21T20:59:34Z) - An Experimental Study of Deep Neural Network Models for Vietnamese
Multiple-Choice Reading Comprehension [2.7528170226206443]
We conduct experiments on neural network-based model to understand the impact of word representation to machine reading comprehension.
Our experiments include using the Co-match model on six different Vietnamese word embeddings and the BERT model for multiple-choice reading comprehension.
On the ViMMRC corpus, the accuracy of BERT model is 61.28% on test set.
arXiv Detail & Related papers (2020-08-20T07:29:14Z) - A Sentence Cloze Dataset for Chinese Machine Reading Comprehension [64.07894249743767]
We propose a new task called Sentence Cloze-style Machine Reading (SC-MRC)
The proposed task aims to fill the right candidate sentence into the passage that has several blanks.
We built a Chinese dataset called CMRC 2019 to evaluate the difficulty of the SC-MRC task.
arXiv Detail & Related papers (2020-04-07T04:09:00Z) - Enhancing lexical-based approach with external knowledge for Vietnamese
multiple-choice machine reading comprehension [2.5199066832791535]
We construct a dataset which consists of 2,783 pairs of multiple-choice questions and answers based on 417 Vietnamese texts.
We propose a lexical-based MRC method that utilizes semantic similarity measures and external knowledge sources to analyze questions and extract answers from the given text.
Our proposed method achieves 61.81% by accuracy, which is 5.51% higher than the best baseline model.
arXiv Detail & Related papers (2020-01-16T08:09:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.