Related papers: MedDG: An Entity-Centric Medical Consultation Dataset for Entity-Aware Medical Dialogue Generation

MedDG: An Entity-Centric Medical Consultation Dataset for Entity-Aware Medical Dialogue Generation

URL: http://arxiv.org/abs/2010.07497v2
Date: Sun, 31 Jul 2022 06:04:16 GMT
Title: MedDG: An Entity-Centric Medical Consultation Dataset for Entity-Aware Medical Dialogue Generation
Authors: Wenge Liu, Jianheng Tang, Yi Cheng, Wenjie Li, Yefeng Zheng, Xiaodan Liang
Abstract summary: We build and release a large-scale high-quality Medical Dialogue dataset related to 12 types of common Gastrointestinal diseases named MedDG. We propose two kinds of medical dialogue tasks based on MedDG dataset. One is the next entity prediction and the other is the doctor response generation. Experimental results show that the pre-train language models and other baselines struggle on both tasks with poor performance in our dataset.
Score: 86.38736781043109
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Developing conversational agents to interact with patients and provide primary clinical advice has attracted increasing attention due to its huge application potential, especially in the time of COVID-19 Pandemic. However, the training of end-to-end neural-based medical dialogue system is restricted by an insufficient quantity of medical dialogue corpus. In this work, we make the first attempt to build and release a large-scale high-quality Medical Dialogue dataset related to 12 types of common Gastrointestinal diseases named MedDG, with more than 17K conversations collected from the online health consultation community. Five different categories of entities, including diseases, symptoms, attributes, tests, and medicines, are annotated in each conversation of MedDG as additional labels. To push forward the future research on building expert-sensitive medical dialogue system, we proposes two kinds of medical dialogue tasks based on MedDG dataset. One is the next entity prediction and the other is the doctor response generation. To acquire a clear comprehension on these two medical dialogue tasks, we implement several state-of-the-art benchmarks, as well as design two dialogue models with a further consideration on the predicted entities. Experimental results show that the pre-train language models and other baselines struggle on both tasks with poor performance in our dataset, and the response quality can be enhanced with the help of auxiliary entity information. From human evaluation, the simple retrieval model outperforms several state-of-the-art generative models, indicating that there still remains a large room for improvement on generating medically meaningful responses.

Related papers

MedGemma Technical Report [75.88152277443179]
We introduce MedGemma, a collection of medical vision-language foundation models based on Gemma 3 4B and 27B.<n>MedGemma demonstrates advanced medical understanding and reasoning on images and text.<n>We additionally introduce MedSigLIP, a medically-tuned vision encoder derived from SigLIP.
arXiv Detail & Related papers (2025-07-07T17:01:44Z)
MediTOD: An English Dialogue Dataset for Medical History Taking with Comprehensive Annotations [23.437292621092823]
We introduce MediTOD, a dataset of doctor-patient dialogues in English for the medical history-taking task. We devise a questionnaire-based labeling scheme tailored to the medical domain. Then, medical professionals create the dataset with high-quality comprehensive annotations.
arXiv Detail & Related papers (2024-10-18T06:38:22Z)
Medical Vision-Language Pre-Training for Brain Abnormalities [96.1408455065347]
We show how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed. In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset. We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain.
arXiv Detail & Related papers (2024-04-27T05:03:42Z)
Customizing General-Purpose Foundation Models for Medical Report Generation [64.31265734687182]
The scarcity of labelled medical image-report pairs presents great challenges in the development of deep and large-scale neural networks. We propose customizing off-the-shelf general-purpose large-scale pre-trained models, i.e., foundation models (FMs) in computer vision and natural language processing.
arXiv Detail & Related papers (2023-06-09T03:02:36Z)
Medical Dialogue Generation via Dual Flow Modeling [9.328694317877169]
Medical dialogue systems (MDS) aim to provide patients with medical services, such as diagnosis and prescription. Previous studies mainly addressed this by extracting the mentioned medical entities as critical dialogue history information. In this work, we argue that it is also essential to capture the transitions of the medical entities and the doctor's dialogue acts in each turn.
arXiv Detail & Related papers (2023-05-29T14:23:34Z)
CDialog: A Multi-turn Covid-19 Conversation Dataset for Entity-Aware Dialog Generation [18.047064216849204]
We release a high-quality multi-turn Medical Dialog dataset relating to Covid-19 disease named CDialog. We propose a novel neural medical dialog system based on the CDialog dataset to advance future research on developing automated medical dialog systems.
arXiv Detail & Related papers (2022-11-16T11:07:34Z)
Medical Dialogue Response Generation with Pivotal Information Recalling [27.351688914399013]
We propose a medical response generation model with Pivotal Information Recalling (MedPIR) MedPIR is built on two components, i.e., knowledge-aware dialogue graph encoder and recall-enhanced generator. Experimental results on two large-scale medical dialogue datasets show that MedPIR outperforms the strong baselines in BLEU scores and medical entities F1 measure.
arXiv Detail & Related papers (2022-06-17T08:11:10Z)
A Benchmark for Automatic Medical Consultation System: Frameworks, Tasks and Datasets [70.32630628211803]
We propose two frameworks to support automatic medical consultation, namely doctor-patient dialogue understanding and task-oriented interaction. A new large medical dialogue dataset with multi-level fine-grained annotations is introduced. We report a set of benchmark results for each task, which shows the usability of the dataset and sets a baseline for future studies.
arXiv Detail & Related papers (2022-04-19T16:43:21Z)
M^2-MedDialog: A Dataset and Benchmarks for Multi-domain Multi-service Medical Dialogues [25.58066103487436]
Medical dialogue systems (MDSs) aim to assist doctors and patients with a range of professional medical services. No dataset has so large-scale dialogues contains both multiple medical services and fine-grained medical labels. We first build a Multiple-domain Multiple-service medical dialogue (M2-MedDialog)dataset, which contains 1,557 conversations between doctors and patients.
arXiv Detail & Related papers (2021-09-01T15:24:54Z)
Semi-Supervised Variational Reasoning for Medical Dialogue Generation [70.838542865384]
Two key characteristics are relevant for medical dialogue generation: patient states and physician actions. We propose an end-to-end variational reasoning approach to medical dialogue generation. A physician policy network composed of an action-classifier and two reasoning detectors is proposed for augmented reasoning ability.
arXiv Detail & Related papers (2021-05-13T04:14:35Z)
On the Generation of Medical Dialogues for COVID-19 [60.63485429268256]
People experiencing COVID19-related symptoms or exposed to risk factors have a pressing need to consult doctors. Because of the shortage of medical professionals, many people cannot receive online consultations timely. We aim to develop a medical dialogue system that can provide COVID19-related consultations.
arXiv Detail & Related papers (2020-05-11T21:23:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.