Conversational Machine Reading Comprehension for Vietnamese Healthcare
Texts
- URL: http://arxiv.org/abs/2105.01542v2
- Date: Wed, 5 May 2021 01:48:26 GMT
- Title: Conversational Machine Reading Comprehension for Vietnamese Healthcare
Texts
- Authors: Son T. Luu, Mao Nguyen Bui, Loi Duc Nguyen, Khiem Vinh Tran, Kiet Van
Nguyen, Ngan Luu-Thuy Nguyen
- Abstract summary: We present a new Vietnamese corpus for conversational machine reading comprehension (UIT-ViCoQA)
UIT-ViCoQA consists of 10,000 questions with answers over 2,000 conversations about health news articles.
The best model obtains an F1 score of 45.27%, which is 30.91 points behind human performance (76.18%), indicating that there is ample room for improvement.
- Score: 0.2446672595462589
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Machine reading comprehension (MRC) is a sub-field in natural language
processing or computational linguistics. MRC aims to help computers understand
unstructured texts and then answer questions related to them. In this paper, we
present a new Vietnamese corpus for conversational machine reading
comprehension (UIT-ViCoQA), consisting of 10,000 questions with answers over
2,000 conversations about health news articles. We analyze UIT-ViCoQA in depth
with different linguistic aspects. Then, we evaluate several baseline models
about dialogue and reading comprehension on the UIT-ViCoQA corpus. The best
model obtains an F1 score of 45.27%, which is 30.91 points behind human
performance (76.18%), indicating that there is ample room for improvement.
Related papers
- How to Engage Your Readers? Generating Guiding Questions to Promote Active Reading [60.19226384241482]
We introduce GuidingQ, a dataset of 10K in-text questions from textbooks and scientific articles.
We explore various approaches to generate such questions using language models.
We conduct a human study to understand the implication of such questions on reading comprehension.
arXiv Detail & Related papers (2024-07-19T13:42:56Z) - MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering [58.92057773071854]
We introduce MTVQA, the first benchmark featuring high-quality human expert annotations across 9 diverse languages.
MTVQA is the first benchmark featuring high-quality human expert annotations across 9 diverse languages.
arXiv Detail & Related papers (2024-05-20T12:35:01Z) - VlogQA: Task, Dataset, and Baseline Models for Vietnamese Spoken-Based Machine Reading Comprehension [1.3942150186842373]
This paper presents the development process of a Vietnamese spoken language corpus for machine reading comprehension tasks.
The existing MRC corpora in Vietnamese mainly focus on formal written documents such as Wikipedia articles, online newspapers, or textbooks.
In contrast, the VlogQA consists of 10,076 question-answer pairs based on 1,230 transcript documents sourced from YouTube.
arXiv Detail & Related papers (2024-02-05T00:54:40Z) - Can Language Models Learn to Listen? [96.01685069483025]
We present a framework for generating appropriate facial responses from a listener in dyadic social interactions based on the speaker's words.
Our approach autoregressively predicts a response of a listener: a sequence of listener facial gestures, quantized using a VQ-VAE.
We show that our generated listener motion is fluent and reflective of language semantics through quantitative metrics and a qualitative user study.
arXiv Detail & Related papers (2023-08-21T17:59:02Z) - A Corpus for Sentence-level Subjectivity Detection on English News Articles [49.49218203204942]
We use our guidelines to collect NewsSD-ENG, a corpus of 638 objective and 411 subjective sentences extracted from English news articles on controversial topics.
Our corpus paves the way for subjectivity detection in English without relying on language-specific tools, such as lexicons or machine translation.
arXiv Detail & Related papers (2023-05-29T11:54:50Z) - A Multiple Choices Reading Comprehension Corpus for Vietnamese Language
Education [2.5199066832791535]
ViMMRC 2.0 is an extension of the previous ViMMRC for the task of multiple-choice reading comprehension in Vietnamese Textbooks.
This dataset has 699 reading passages which are prose and poems, and 5,273 questions.
Our multi-stage models achieved 58.81% by Accuracy on the test set, which is 5.34% better than the highest BERTology models.
arXiv Detail & Related papers (2023-03-31T15:54:54Z) - RuArg-2022: Argument Mining Evaluation [69.87149207721035]
This paper is a report of the organizers on the first competition of argumentation analysis systems dealing with Russian language texts.
A corpus containing 9,550 sentences (comments on social media posts) on three topics related to the COVID-19 pandemic was prepared.
The system that won the first place in both tasks used the NLI (Natural Language Inference) variant of the BERT architecture.
arXiv Detail & Related papers (2022-06-18T17:13:37Z) - Sentence Extraction-Based Machine Reading Comprehension for Vietnamese [0.2446672595462589]
We introduce the UIT-ViWikiQA, the first dataset for evaluating sentence extraction-based machine reading comprehension in Vietnamese language.
The dataset consists of comprises 23.074 question-answers based on 5.109 passages of 174 Vietnamese articles from Wikipedia.
Our experiments show that the best machine model is XLM-R$_Large, which achieves an exact match (EM) score of 85.97% and an F1-score of 88.77% on our dataset.
arXiv Detail & Related papers (2021-05-19T10:22:27Z) - Towards Data Distillation for End-to-end Spoken Conversational Question
Answering [65.124088336738]
We propose a new Spoken Conversational Question Answering task (SCQA)
SCQA aims at enabling QA systems to model complex dialogues flow given the speech utterances and text corpora.
Our main objective is to build a QA system to deal with conversational questions both in spoken and text forms.
arXiv Detail & Related papers (2020-10-18T05:53:39Z) - An Experimental Study of Deep Neural Network Models for Vietnamese
Multiple-Choice Reading Comprehension [2.7528170226206443]
We conduct experiments on neural network-based model to understand the impact of word representation to machine reading comprehension.
Our experiments include using the Co-match model on six different Vietnamese word embeddings and the BERT model for multiple-choice reading comprehension.
On the ViMMRC corpus, the accuracy of BERT model is 61.28% on test set.
arXiv Detail & Related papers (2020-08-20T07:29:14Z) - New Vietnamese Corpus for Machine Reading Comprehension of Health News
Articles [2.5199066832791535]
This paper presents ViNewsQA as a new corpus for the Vietnamese language to evaluate healthcare reading comprehension models.
The corpus comprises 22,057 human-generated question-answer pairs.
Our experiments show that the best machine model is ALBERT, which achieves an exact match score of 65.26% and an F1-score of 84.89% on our corpus.
arXiv Detail & Related papers (2020-06-19T13:49:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.