ViMQ: A Vietnamese Medical Question Dataset for Healthcare Dialogue
System Development
- URL: http://arxiv.org/abs/2304.14405v1
- Date: Thu, 27 Apr 2023 17:59:53 GMT
- Title: ViMQ: A Vietnamese Medical Question Dataset for Healthcare Dialogue
System Development
- Authors: Ta Duc Huy, Nguyen Anh Tu, Tran Hoang Vu, Nguyen Phuc Minh, Nguyen
Phan, Trung H. Bui, Steven Q. H. Truong
- Abstract summary: We publish a Vietnamese dataset of medical questions from patients with sentence-level and entity-level annotations.
We propose a simple self-supervised training strategy with span-noise modelling that improves the performance.
- Score: 1.4315915057750197
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing medical text datasets usually take the form of ques- tion and answer
pairs that support the task of natural language gener- ation, but lacking the
composite annotations of the medical terms. In this study, we publish a
Vietnamese dataset of medical questions from patients with sentence-level and
entity-level annotations for the Intent Classification and Named Entity
Recognition tasks. The tag sets for two tasks are in medical domain and can
facilitate the development of task- oriented healthcare chatbots with better
comprehension of queries from patients. We train baseline models for the two
tasks and propose a simple self-supervised training strategy with span-noise
modelling that substan- tially improves the performance. Dataset and code will
be published at https://github.com/tadeephuy/ViMQ
Related papers
- MediTOD: An English Dialogue Dataset for Medical History Taking with Comprehensive Annotations [23.437292621092823]
We introduce MediTOD, a dataset of doctor-patient dialogues in English for the medical history-taking task.
We devise a questionnaire-based labeling scheme tailored to the medical domain.
Then, medical professionals create the dataset with high-quality comprehensive annotations.
arXiv Detail & Related papers (2024-10-18T06:38:22Z) - Medical Vision-Language Pre-Training for Brain Abnormalities [96.1408455065347]
We show how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed.
In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset.
We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain.
arXiv Detail & Related papers (2024-04-27T05:03:42Z) - RuMedBench: A Russian Medical Language Understanding Benchmark [58.99199480170909]
The paper describes the open Russian medical language understanding benchmark covering several task types.
We prepare the unified format labeling, data split, and evaluation metrics for new tasks.
A single-number metric expresses a model's ability to cope with the benchmark.
arXiv Detail & Related papers (2022-01-17T16:23:33Z) - MIMO: Mutual Integration of Patient Journey and Medical Ontology for
Healthcare Representation Learning [49.57261599776167]
We propose an end-to-end robust Transformer-based solution, Mutual Integration of patient journey and Medical Ontology (MIMO) for healthcare representation learning and predictive analytics.
arXiv Detail & Related papers (2021-07-20T07:04:52Z) - Benchmarking Automated Clinical Language Simplification: Dataset,
Algorithm, and Evaluation [48.87254340298189]
We construct a new dataset named MedLane to support the development and evaluation of automated clinical language simplification approaches.
We propose a new model called DECLARE that follows the human annotation procedure and achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-12-04T06:09:02Z) - MedDG: An Entity-Centric Medical Consultation Dataset for Entity-Aware
Medical Dialogue Generation [86.38736781043109]
We build and release a large-scale high-quality Medical Dialogue dataset related to 12 types of common Gastrointestinal diseases named MedDG.
We propose two kinds of medical dialogue tasks based on MedDG dataset. One is the next entity prediction and the other is the doctor response generation.
Experimental results show that the pre-train language models and other baselines struggle on both tasks with poor performance in our dataset.
arXiv Detail & Related papers (2020-10-15T03:34:33Z) - Towards an Automated SOAP Note: Classifying Utterances from Medical
Conversations [0.6875312133832078]
We bridge the gap for classifying utterances from medical conversations according to (i) the SOAP section and (ii) the speaker role.
We present a systematic analysis in which we adapt an existing deep learning architecture to the two aforementioned tasks.
The results suggest that modelling context in a hierarchical manner, which captures both word and utterance level context, yields substantial improvements on both classification tasks.
arXiv Detail & Related papers (2020-07-17T04:19:30Z) - Self-Attention Enhanced Patient Journey Understanding in Healthcare
System [43.11457142941327]
MusaNet is designed to learn the representations of patient journeys that is used to be a long sequence of activities.
The MusaNet is trained in end-to-end manner using the training data derived from EHRs.
Results have demonstrated the proposed MusaNet produces higher-quality representations than state-of-the-art baseline methods.
arXiv Detail & Related papers (2020-06-15T10:32:36Z) - Learning Contextualized Document Representations for Healthcare Answer
Retrieval [68.02029435111193]
Contextual Discourse Vectors (CDV) is a distributed document representation for efficient answer retrieval from long documents.
Our model leverages a dual encoder architecture with hierarchical LSTM layers and multi-task training to encode the position of clinical entities and aspects alongside the document discourse.
We show that our generalized model significantly outperforms several state-of-the-art baselines for healthcare passage ranking.
arXiv Detail & Related papers (2020-02-03T15:47:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.