WangLab at MEDIQA-Chat 2023: Clinical Note Generation from
Doctor-Patient Conversations using Large Language Models
- URL: http://arxiv.org/abs/2305.02220v2
- Date: Sat, 3 Jun 2023 17:56:29 GMT
- Title: WangLab at MEDIQA-Chat 2023: Clinical Note Generation from
Doctor-Patient Conversations using Large Language Models
- Authors: John Giorgi, Augustin Toma, Ronald Xie, Sondra S. Chen, Kevin R. An,
Grace X. Zheng, Bo Wang
- Abstract summary: We submit to the MEDIQA-Chat 2023 shared task for automatic clinical note generation from doctor-patient conversations.
We report results for two approaches: the first fine-tunes a pre-trained language model (PLM) on the shared task data, and the second uses few-shot in-context learning (ICL) with a large language model (LLM)
Expert human scrutiny indicates that notes generated via the ICL-based approach with GPT-4 are preferred about as often as human-written notes.
- Score: 2.3608256778747565
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper describes our submission to the MEDIQA-Chat 2023 shared task for
automatic clinical note generation from doctor-patient conversations. We report
results for two approaches: the first fine-tunes a pre-trained language model
(PLM) on the shared task data, and the second uses few-shot in-context learning
(ICL) with a large language model (LLM). Both achieve high performance as
measured by automatic metrics (e.g. ROUGE, BERTScore) and ranked second and
first, respectively, of all submissions to the shared task. Expert human
scrutiny indicates that notes generated via the ICL-based approach with GPT-4
are preferred about as often as human-written notes, making it a promising path
toward automated note generation from doctor-patient conversations.
Related papers
- Improving Clinical Note Generation from Complex Doctor-Patient Conversation [20.2157016701399]
We present three key contributions to the field of clinical note generation using large language models (LLMs)
First, we introduce CliniKnote, a dataset consisting of 1,200 complex doctor-patient conversations paired with their full clinical notes.
Second, we propose K-SOAP, which enhances traditional SOAPcitepodder20soap (Subjective, Objective, Assessment, and Plan) notes by adding a keyword section at the top, allowing for quick identification of essential information.
Third, we develop an automatic pipeline to generate K-SOAP notes from doctor-patient conversations and benchmark various modern LLMs using various
arXiv Detail & Related papers (2024-08-26T18:39:31Z) - Enhancing Summarization Performance through Transformer-Based Prompt
Engineering in Automated Medical Reporting [0.49478969093606673]
Two-shot prompting approach in combination with scope and domain context outperforms other methods.
The automated reports are approximately twice as long as the human references.
arXiv Detail & Related papers (2023-11-22T09:51:53Z) - Evaluating Large Language Models for Document-grounded Response
Generation in Information-Seeking Dialogues [17.41334279810008]
We investigate the use of large language models (LLMs) like ChatGPT for document-grounded response generation in the context of information-seeking dialogues.
For evaluation, we use the MultiDoc2Dial corpus of task-oriented dialogues in four social service domains.
While both ChatGPT variants are more likely to include information not present in the relevant segments, possibly including a presence of hallucinations, they are rated higher than both the shared task winning system and human responses.
arXiv Detail & Related papers (2023-09-21T07:28:03Z) - GersteinLab at MEDIQA-Chat 2023: Clinical Note Summarization from
Doctor-Patient Conversations through Fine-tuning and In-context Learning [4.2570830892708225]
This paper presents our contribution to the MEDIQA-2023 Dialogue2Note shared task, encompassing both subtask A and subtask B.
We approach the task as a dialogue summarization problem and implement two distinct pipelines: (a) a fine-tuning of a pre-trained dialogue summarization model and GPT-3, and (b) few-shot in-context learning (ICL) using a large language model, GPT-4.
Both methods achieve excellent results in terms of ROUGE-1 F1, BERTScore F1 (deberta-xlarge-mnli), and BLEURT
arXiv Detail & Related papers (2023-05-08T19:16:26Z) - PMC-LLaMA: Towards Building Open-source Language Models for Medicine [62.39105735933138]
Large Language Models (LLMs) have showcased remarkable capabilities in natural language understanding.
LLMs struggle in domains that require precision, such as medical applications, due to their lack of domain-specific knowledge.
We describe the procedure for building a powerful, open-source language model specifically designed for medicine applications, termed as PMC-LLaMA.
arXiv Detail & Related papers (2023-04-27T18:29:05Z) - A Benchmark for Automatic Medical Consultation System: Frameworks, Tasks
and Datasets [70.32630628211803]
We propose two frameworks to support automatic medical consultation, namely doctor-patient dialogue understanding and task-oriented interaction.
A new large medical dialogue dataset with multi-level fine-grained annotations is introduced.
We report a set of benchmark results for each task, which shows the usability of the dataset and sets a baseline for future studies.
arXiv Detail & Related papers (2022-04-19T16:43:21Z) - Human Evaluation and Correlation with Automatic Metrics in Consultation
Note Generation [56.25869366777579]
In recent years, machine learning models have rapidly become better at generating clinical consultation notes.
We present an extensive human evaluation study where 5 clinicians listen to 57 mock consultations, write their own notes, post-edit a number of automatically generated notes, and extract all the errors.
We find that a simple, character-based Levenshtein distance metric performs on par if not better than common model-based metrics like BertScore.
arXiv Detail & Related papers (2022-04-01T14:04:16Z) - Towards more patient friendly clinical notes through language models and
ontologies [57.51898902864543]
We present a novel approach to automated medical text based on word simplification and language modelling.
We use a new dataset pairs of publicly available medical sentences and a version of them simplified by clinicians.
Our method based on a language model trained on medical forum data generates simpler sentences while preserving both grammar and the original meaning.
arXiv Detail & Related papers (2021-12-23T16:11:19Z) - Self-supervised Answer Retrieval on Clinical Notes [68.87777592015402]
We introduce CAPR, a rule-based self-supervision objective for training Transformer language models for domain-specific passage matching.
We apply our objective in four Transformer-based architectures: Contextual Document Vectors, Bi-, Poly- and Cross-encoders.
We report that CAPR outperforms strong baselines in the retrieval of domain-specific passages and effectively generalizes across rule-based and human-labeled passages.
arXiv Detail & Related papers (2021-08-02T10:42:52Z) - MedDG: An Entity-Centric Medical Consultation Dataset for Entity-Aware
Medical Dialogue Generation [86.38736781043109]
We build and release a large-scale high-quality Medical Dialogue dataset related to 12 types of common Gastrointestinal diseases named MedDG.
We propose two kinds of medical dialogue tasks based on MedDG dataset. One is the next entity prediction and the other is the doctor response generation.
Experimental results show that the pre-train language models and other baselines struggle on both tasks with poor performance in our dataset.
arXiv Detail & Related papers (2020-10-15T03:34:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.