Enhanced Electronic Health Records Text Summarization Using Large Language Models
- URL: http://arxiv.org/abs/2410.09628v1
- Date: Sat, 12 Oct 2024 19:36:41 GMT
- Title: Enhanced Electronic Health Records Text Summarization Using Large Language Models
- Authors: Ruvarashe Madzime, Clement Nyirenda,
- Abstract summary: This project builds on prior work by creating a system that generates clinician-preferred, focused summaries.
The proposed system leverages the Flan-T5 model to generate tailored EHR summaries based on clinician-specified topics.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The development of Electronic Health Records summarization systems has revolutionized patient data management. Previous research advanced this field by adapting Large Language Models for clinical tasks, using diverse datasets to generate general EHR summaries. However, clinicians often require specific, focused summaries for quicker insights. This project builds on prior work by creating a system that generates clinician-preferred, focused summaries, improving EHR summarization for more efficient patient care. The proposed system leverages the Google Flan-T5 model to generate tailored EHR summaries based on clinician-specified topics. The approach involved fine-tuning the Flan-T5 model on an EHR question-answering dataset formatted in the Stanford Question Answering Dataset (SQuAD) style, which is a large-scale reading comprehension dataset with questions and answers. Fine-tuning utilized the Seq2SeqTrainer from the Hugging Face Transformers library with optimized hyperparameters. Key evaluation metrics demonstrated promising results: the system achieved an Exact Match (EM) score of 81.81%. ROUGE (Recall-Oriented Understudy for Gisting Evaluation) metrics showed strong performance, with ROUGE-1 at 96.03%, ROUGE-2 at 86.67%, and ROUGE-L at 96.10%. Additionally, the Bilingual Evaluation Understudy (BLEU) score was 63%, reflecting the model's coherence in generating summaries. By enhancing EHR summarization through LLMs, this project supports digital transformation efforts in healthcare, streamlining workflows, and enabling more personalized patient care.
Related papers
- Language Models and Retrieval Augmented Generation for Automated Structured Data Extraction from Diagnostic Reports [2.932283627137903]
The study utilized two datasets: 7,294 radiology reports annotated for Brain Tumor Reporting and Data System (BT-RADS) scores and 2,154 pathology reports for isocitrate dehydrogenase (IDH) mutation status.
arXiv Detail & Related papers (2024-09-15T15:21:45Z) - Towards Enhancing Coherence in Extractive Summarization: Dataset and Experiments with LLMs [70.15262704746378]
We propose a systematically created human-annotated dataset consisting of coherent summaries for five publicly available datasets and natural language user feedback.
Preliminary experiments with Falcon-40B and Llama-2-13B show significant performance improvements (10% Rouge-L) in terms of producing coherent summaries.
arXiv Detail & Related papers (2024-07-05T20:25:04Z) - EHRmonize: A Framework for Medical Concept Abstraction from Electronic Health Records using Large Language Models [21.637722557192482]
We introduce EHRmonize, a framework leveraging large language models to abstract medical concepts from EHR data.
Our study uses medication data from two real-world EHR databases to evaluate five LLMs on two free-text extraction and six binary classification tasks.
GPT-4o achieved an accuracy of 97% in identifying generic route names, 82% for generic drug names, and 100% in performing binary classification of antibiotics.
arXiv Detail & Related papers (2024-06-28T21:39:20Z) - GAMedX: Generative AI-based Medical Entity Data Extractor Using Large Language Models [1.123722364748134]
This paper introduces GAMedX, a Named Entity Recognition (NER) approach utilizing Large Language Models (LLMs)
The methodology integrates open-source LLMs for NER, utilizing chained prompts and Pydantic schemas for structured output to navigate the complexities of specialized medical jargon.
The findings reveal significant ROUGE F1 score on one of the evaluation datasets with an accuracy of 98%.
arXiv Detail & Related papers (2024-05-31T02:53:22Z) - README: Bridging Medical Jargon and Lay Understanding for Patient Education through Data-Centric NLP [9.432205523734707]
We introduce a new task of automatically generating lay definitions, aiming to simplify medical terms into patient-friendly lay language.
We first created the dataset, an extensive collection of over 50,000 unique (medical term, lay definition) pairs and 300,000 mentions.
We have also engineered a data-centric Human-AI pipeline that synergizes data filtering, augmentation, and selection to improve data quality.
arXiv Detail & Related papers (2023-12-24T23:01:00Z) - Neural Summarization of Electronic Health Records [8.784162652042957]
We studied the viability of the automatic generation of various sections of a discharge summary using four state-of-the-art neural network summarization models.
Fine-tuning language models that were previously instruction fine-tuned showed better performance in summarizing entire reports.
arXiv Detail & Related papers (2023-05-24T15:05:53Z) - Large Language Models for Healthcare Data Augmentation: An Example on
Patient-Trial Matching [49.78442796596806]
We propose an innovative privacy-aware data augmentation approach for patient-trial matching (LLM-PTM)
Our experiments demonstrate a 7.32% average improvement in performance using the proposed LLM-PTM method, and the generalizability to new data is improved by 12.12%.
arXiv Detail & Related papers (2023-03-24T03:14:00Z) - Does Synthetic Data Generation of LLMs Help Clinical Text Mining? [51.205078179427645]
We investigate the potential of OpenAI's ChatGPT to aid in clinical text mining.
We propose a new training paradigm that involves generating a vast quantity of high-quality synthetic data.
Our method has resulted in significant improvements in the performance of downstream tasks.
arXiv Detail & Related papers (2023-03-08T03:56:31Z) - Textual Data Augmentation for Patient Outcomes Prediction [67.72545656557858]
We propose a novel data augmentation method to generate artificial clinical notes in patients' Electronic Health Records.
We fine-tune the generative language model GPT-2 to synthesize labeled text with the original training data.
We evaluate our method on the most common patient outcome, i.e., the 30-day readmission rate.
arXiv Detail & Related papers (2022-11-13T01:07:23Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - DeepEnroll: Patient-Trial Matching with Deep Embedding and Entailment
Prediction [67.91606509226132]
Clinical trials are essential for drug development but often suffer from expensive, inaccurate and insufficient patient recruitment.
DeepEnroll is a cross-modal inference learning model to jointly encode enrollment criteria (tabular data) into a shared latent space for matching inference.
arXiv Detail & Related papers (2020-01-22T17:51:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.