Summarizing Patients Problems from Hospital Progress Notes Using
Pre-trained Sequence-to-Sequence Models
- URL: http://arxiv.org/abs/2208.08408v1
- Date: Wed, 17 Aug 2022 17:07:35 GMT
- Title: Summarizing Patients Problems from Hospital Progress Notes Using
Pre-trained Sequence-to-Sequence Models
- Authors: Yanjun Gao, Dmitry Dligach, Timothy Miller, Dongfang Xu, Matthew M.
Churpek, Majid Afshar
- Abstract summary: Problem list summarization requires a model to understand, abstract, and generate clinical documentation.
We propose a new NLP task that aims to generate a list of problems in a patient's daily care plan using input from the provider's progress notes during hospitalization.
- Score: 9.879960506853145
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Automatically summarizing patients' main problems from daily progress notes
using natural language processing methods helps to battle against information
and cognitive overload in hospital settings and potentially assists providers
with computerized diagnostic decision support. Problem list summarization
requires a model to understand, abstract, and generate clinical documentation.
In this work, we propose a new NLP task that aims to generate a list of
problems in a patient's daily care plan using input from the provider's
progress notes during hospitalization. We investigate the performance of T5 and
BART, two state-of-the-art seq2seq transformer architectures, in solving this
problem. We provide a corpus built on top of progress notes from publicly
available electronic health record progress notes in the Medical Information
Mart for Intensive Care (MIMIC)-III. T5 and BART are trained on general domain
text, and we experiment with a data augmentation method and a domain adaptation
pre-training method to increase exposure to medical vocabulary and knowledge.
Evaluation methods include ROUGE, BERTScore, cosine similarity on sentence
embedding, and F-score on medical concepts. Results show that T5 with domain
adaptive pre-training achieves significant performance gains compared to a
rule-based system and general domain pre-trained language models, indicating a
promising direction for tackling the problem summarization task.
Related papers
- Automated Information Extraction from Thyroid Operation Narrative: A Comparative Study of GPT-4 and Fine-tuned KoELECTRA [1.137357582959183]
This study focuses on the transformative capabilities of the fine-tuned KoELECTRA model in comparison to the GPT-4 model.
The study leverages advanced natural language processing (NLP) techniques to foster a paradigm shift towards more sophisticated data processing systems.
arXiv Detail & Related papers (2024-06-12T06:44:05Z) - README: Bridging Medical Jargon and Lay Understanding for Patient Education through Data-Centric NLP [9.432205523734707]
We introduce a new task of automatically generating lay definitions, aiming to simplify medical terms into patient-friendly lay language.
We first created the dataset, an extensive collection of over 50,000 unique (medical term, lay definition) pairs and 300,000 mentions.
We have also engineered a data-centric Human-AI pipeline that synergizes data filtering, augmentation, and selection to improve data quality.
arXiv Detail & Related papers (2023-12-24T23:01:00Z) - PULSAR: Pre-training with Extracted Healthcare Terms for Summarising
Patients' Problems and Data Augmentation with Black-box Large Language Models [25.363775123262307]
Automatic summarisation of a patient's problems in the form of a problem list can aid stakeholders in understanding a patient's condition, reducing workload and cognitive bias.
BioNLP 2023 Shared Task 1A focuses on generating a list of diagnoses and problems from the provider's progress notes during hospitalisation.
One component employs large language models (LLMs) for data augmentation; the other is an abstractive summarisation LLM with a novel pre-training objective for generating the patients' problems summarised as a list.
Our approach was ranked second among all submissions to the shared task.
arXiv Detail & Related papers (2023-06-05T10:17:50Z) - Retrieval-Augmented and Knowledge-Grounded Language Models for Faithful Clinical Medicine [68.7814360102644]
We propose the Re$3$Writer method with retrieval-augmented generation and knowledge-grounded reasoning.
We demonstrate the effectiveness of our method in generating patient discharge instructions.
arXiv Detail & Related papers (2022-10-23T16:34:39Z) - Hierarchical Annotation for Building A Suite of Clinical Natural
Language Processing Tasks: Progress Note Understanding [4.5939673461957335]
This work introduces a hierarchical annotation schema with three stages to address clinical text understanding, clinical reasoning, and summarization.
We created an annotated corpus based on an extensive collection of publicly available daily progress notes.
We also define a new suite of tasks, Progress Note Understanding, with three tasks utilizing the three annotation stages.
arXiv Detail & Related papers (2022-04-06T18:38:08Z) - Towards more patient friendly clinical notes through language models and
ontologies [57.51898902864543]
We present a novel approach to automated medical text based on word simplification and language modelling.
We use a new dataset pairs of publicly available medical sentences and a version of them simplified by clinicians.
Our method based on a language model trained on medical forum data generates simpler sentences while preserving both grammar and the original meaning.
arXiv Detail & Related papers (2021-12-23T16:11:19Z) - Self-supervised Answer Retrieval on Clinical Notes [68.87777592015402]
We introduce CAPR, a rule-based self-supervision objective for training Transformer language models for domain-specific passage matching.
We apply our objective in four Transformer-based architectures: Contextual Document Vectors, Bi-, Poly- and Cross-encoders.
We report that CAPR outperforms strong baselines in the retrieval of domain-specific passages and effectively generalizes across rule-based and human-labeled passages.
arXiv Detail & Related papers (2021-08-02T10:42:52Z) - MIMO: Mutual Integration of Patient Journey and Medical Ontology for
Healthcare Representation Learning [49.57261599776167]
We propose an end-to-end robust Transformer-based solution, Mutual Integration of patient journey and Medical Ontology (MIMO) for healthcare representation learning and predictive analytics.
arXiv Detail & Related papers (2021-07-20T07:04:52Z) - Benchmarking Automated Clinical Language Simplification: Dataset,
Algorithm, and Evaluation [48.87254340298189]
We construct a new dataset named MedLane to support the development and evaluation of automated clinical language simplification approaches.
We propose a new model called DECLARE that follows the human annotation procedure and achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-12-04T06:09:02Z) - BiteNet: Bidirectional Temporal Encoder Network to Predict Medical
Outcomes [53.163089893876645]
We propose a novel self-attention mechanism that captures the contextual dependency and temporal relationships within a patient's healthcare journey.
An end-to-end bidirectional temporal encoder network (BiteNet) then learns representations of the patient's journeys.
We have evaluated the effectiveness of our methods on two supervised prediction and two unsupervised clustering tasks with a real-world EHR dataset.
arXiv Detail & Related papers (2020-09-24T00:42:36Z) - Towards an Automated SOAP Note: Classifying Utterances from Medical
Conversations [0.6875312133832078]
We bridge the gap for classifying utterances from medical conversations according to (i) the SOAP section and (ii) the speaker role.
We present a systematic analysis in which we adapt an existing deep learning architecture to the two aforementioned tasks.
The results suggest that modelling context in a hierarchical manner, which captures both word and utterance level context, yields substantial improvements on both classification tasks.
arXiv Detail & Related papers (2020-07-17T04:19:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.