PULSAR: Pre-training with Extracted Healthcare Terms for Summarising
Patients' Problems and Data Augmentation with Black-box Large Language Models
- URL: http://arxiv.org/abs/2306.02754v1
- Date: Mon, 5 Jun 2023 10:17:50 GMT
- Title: PULSAR: Pre-training with Extracted Healthcare Terms for Summarising
Patients' Problems and Data Augmentation with Black-box Large Language Models
- Authors: Hao Li, Yuping Wu, Viktor Schlegel, Riza Batista-Navarro, Thanh-Tung
Nguyen, Abhinav Ramesh Kashyap, Xiaojun Zeng, Daniel Beck, Stefan Winkler,
Goran Nenadic
- Abstract summary: Automatic summarisation of a patient's problems in the form of a problem list can aid stakeholders in understanding a patient's condition, reducing workload and cognitive bias.
BioNLP 2023 Shared Task 1A focuses on generating a list of diagnoses and problems from the provider's progress notes during hospitalisation.
One component employs large language models (LLMs) for data augmentation; the other is an abstractive summarisation LLM with a novel pre-training objective for generating the patients' problems summarised as a list.
Our approach was ranked second among all submissions to the shared task.
- Score: 25.363775123262307
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Medical progress notes play a crucial role in documenting a patient's
hospital journey, including his or her condition, treatment plan, and any
updates for healthcare providers. Automatic summarisation of a patient's
problems in the form of a problem list can aid stakeholders in understanding a
patient's condition, reducing workload and cognitive bias. BioNLP 2023 Shared
Task 1A focuses on generating a list of diagnoses and problems from the
provider's progress notes during hospitalisation. In this paper, we introduce
our proposed approach to this task, which integrates two complementary
components. One component employs large language models (LLMs) for data
augmentation; the other is an abstractive summarisation LLM with a novel
pre-training objective for generating the patients' problems summarised as a
list. Our approach was ranked second among all submissions to the shared task.
The performance of our model on the development and test datasets shows that
our approach is more robust on unknown data, with an improvement of up to 3.1
points over the same size of the larger model.
Related papers
- NOTE: Notable generation Of patient Text summaries through Efficient
approach based on direct preference optimization [0.0]
"NOTE" stands for "Notable generation Of patient Text summaries through an Efficient approach based on direct preference optimization"
Patient events are sequentially combined and used to generate a discharge summary for each hospitalization.
Note can be utilized to generate various summaries not only discharge summaries but also throughout a patient's journey.
arXiv Detail & Related papers (2024-02-19T06:43:25Z) - README: Bridging Medical Jargon and Lay Understanding for Patient Education through Data-Centric NLP [9.432205523734707]
We introduce a new task of automatically generating lay definitions, aiming to simplify medical terms into patient-friendly lay language.
We first created the dataset, an extensive collection of over 50,000 unique (medical term, lay definition) pairs and 300,000 mentions.
We have also engineered a data-centric Human-AI pipeline that synergizes data filtering, augmentation, and selection to improve data quality.
arXiv Detail & Related papers (2023-12-24T23:01:00Z) - Learnable Weight Initialization for Volumetric Medical Image Segmentation [66.3030435676252]
We propose a learnable weight-based hybrid medical image segmentation approach.
Our approach is easy to integrate into any hybrid model and requires no external training data.
Experiments on multi-organ and lung cancer segmentation tasks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-06-15T17:55:05Z) - Overview of the Problem List Summarization (ProbSum) 2023 Shared Task on
Summarizing Patients' Active Diagnoses and Problems from Electronic Health
Record Progress Notes [5.222442967088892]
The BioNLP Workshop 2023 initiated the launch of a shared task on Problem List Summarization (ProbSum)
The goal for participants is to develop models that generated a list of diagnoses and problems using input from the daily care notes collected from the hospitalization of critically ill patients.
Eight teams submitted their final systems to the shared task leaderboard.
arXiv Detail & Related papers (2023-06-08T15:19:57Z) - MedNgage: A Dataset for Understanding Engagement in Patient-Nurse
Conversations [4.847266237348932]
Patients who effectively manage their symptoms often demonstrate higher levels of engagement in conversations and interventions with healthcare practitioners.
It is crucial for AI systems to understand the engagement in natural conversations between patients and practitioners to better contribute toward patient care.
We present a novel dataset (MedNgage) which consists of patient-nurse conversations about cancer symptom management.
arXiv Detail & Related papers (2023-05-31T16:06:07Z) - SPeC: A Soft Prompt-Based Calibration on Performance Variability of
Large Language Model in Clinical Notes Summarization [50.01382938451978]
We introduce a model-agnostic pipeline that employs soft prompts to diminish variance while preserving the advantages of prompt-based summarization.
Experimental findings indicate that our method not only bolsters performance but also effectively curbs variance for various language models.
arXiv Detail & Related papers (2023-03-23T04:47:46Z) - Summarizing Patients Problems from Hospital Progress Notes Using
Pre-trained Sequence-to-Sequence Models [9.879960506853145]
Problem list summarization requires a model to understand, abstract, and generate clinical documentation.
We propose a new NLP task that aims to generate a list of problems in a patient's daily care plan using input from the provider's progress notes during hospitalization.
arXiv Detail & Related papers (2022-08-17T17:07:35Z) - MIMO: Mutual Integration of Patient Journey and Medical Ontology for
Healthcare Representation Learning [49.57261599776167]
We propose an end-to-end robust Transformer-based solution, Mutual Integration of patient journey and Medical Ontology (MIMO) for healthcare representation learning and predictive analytics.
arXiv Detail & Related papers (2021-07-20T07:04:52Z) - Clinical Outcome Prediction from Admission Notes using Self-Supervised
Knowledge Integration [55.88616573143478]
Outcome prediction from clinical text can prevent doctors from overlooking possible risks.
Diagnoses at discharge, procedures performed, in-hospital mortality and length-of-stay prediction are four common outcome prediction targets.
We propose clinical outcome pre-training to integrate knowledge about patient outcomes from multiple public sources.
arXiv Detail & Related papers (2021-02-08T10:26:44Z) - MedDG: An Entity-Centric Medical Consultation Dataset for Entity-Aware
Medical Dialogue Generation [86.38736781043109]
We build and release a large-scale high-quality Medical Dialogue dataset related to 12 types of common Gastrointestinal diseases named MedDG.
We propose two kinds of medical dialogue tasks based on MedDG dataset. One is the next entity prediction and the other is the doctor response generation.
Experimental results show that the pre-train language models and other baselines struggle on both tasks with poor performance in our dataset.
arXiv Detail & Related papers (2020-10-15T03:34:33Z) - BiteNet: Bidirectional Temporal Encoder Network to Predict Medical
Outcomes [53.163089893876645]
We propose a novel self-attention mechanism that captures the contextual dependency and temporal relationships within a patient's healthcare journey.
An end-to-end bidirectional temporal encoder network (BiteNet) then learns representations of the patient's journeys.
We have evaluated the effectiveness of our methods on two supervised prediction and two unsupervised clustering tasks with a real-world EHR dataset.
arXiv Detail & Related papers (2020-09-24T00:42:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.