MedSlice: Fine-Tuned Large Language Models for Secure Clinical Note Sectioning
- URL: http://arxiv.org/abs/2501.14105v1
- Date: Thu, 23 Jan 2025 21:32:09 GMT
- Title: MedSlice: Fine-Tuned Large Language Models for Secure Clinical Note Sectioning
- Authors: Joshua Davis, Thomas Sounack, Kate Sciacca, Jessie M Brain, Brigitte N Durieux, Nicole D Agaronnik, Charlotta Lindvall,
- Abstract summary: Fine-tuned open-source LLMs can surpass proprietary models in clinical note sectioning.
This study focuses on three sections: History of Present Illness, Interval History, and Assessment and Plan.
- Score: 2.4060718165478376
- License:
- Abstract: Extracting sections from clinical notes is crucial for downstream analysis but is challenging due to variability in formatting and labor-intensive nature of manual sectioning. While proprietary large language models (LLMs) have shown promise, privacy concerns limit their accessibility. This study develops a pipeline for automated note sectioning using open-source LLMs, focusing on three sections: History of Present Illness, Interval History, and Assessment and Plan. We fine-tuned three open-source LLMs to extract sections using a curated dataset of 487 progress notes, comparing results relative to proprietary models (GPT-4o, GPT-4o mini). Internal and external validity were assessed via precision, recall and F1 score. Fine-tuned Llama 3.1 8B outperformed GPT-4o (F1=0.92). On the external validity test set, performance remained high (F1= 0.85). Fine-tuned open-source LLMs can surpass proprietary models in clinical note sectioning, offering advantages in cost, performance, and accessibility.
Related papers
- Toward Automatic Relevance Judgment using Vision--Language Models for Image--Text Retrieval Evaluation [56.49084589053732]
Vision--Language Models (VLMs) have demonstrated success across diverse applications, yet their potential to assist in relevance judgments remains uncertain.
This paper assesses the relevance estimation capabilities of VLMs, including CLIP, LLaVA, and GPT-4V, within a large-scale textitad hoc retrieval task tailored for multimedia content creation in a zero-shot fashion.
arXiv Detail & Related papers (2024-08-02T16:15:25Z) - Closing the gap between open-source and commercial large language models for medical evidence summarization [20.60798771155072]
Large language models (LLMs) hold great promise in summarizing medical evidence.
Most recent studies focus on the application of proprietary LLMs.
While open-source LLMs allow better transparency and customization, their performance falls short compared to proprietary ones.
arXiv Detail & Related papers (2024-07-25T05:03:01Z) - LLM-Select: Feature Selection with Large Language Models [64.5099482021597]
Large language models (LLMs) are capable of selecting the most predictive features, with performance rivaling the standard tools of data science.
Our findings suggest that LLMs may be useful not only for selecting the best features for training but also for deciding which features to collect in the first place.
arXiv Detail & Related papers (2024-07-02T22:23:40Z) - RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness [102.06442250444618]
We introduce RLAIF-V, a novel framework that aligns MLLMs in a fully open-source paradigm.
RLAIF-V maximally explores open-source MLLMs from two perspectives, including high-quality feedback data generation.
Experiments on six benchmarks in both automatic and human evaluation show that RLAIF-V substantially enhances the trustworthiness of models.
arXiv Detail & Related papers (2024-05-27T14:37:01Z) - Adapting Open-Source Large Language Models for Cost-Effective, Expert-Level Clinical Note Generation with On-Policy Reinforcement Learning [19.08691249610632]
This study presents a comprehensive domain- and task-specific adaptation process for the open-source LLaMA-2 13 billion parameter model.
We introduce a new approach, DistillDirect, for performing on-policy reinforcement learning with Gemini 1.0 Pro as the teacher model.
Our model, LLaMA-Clinic, can generate clinical notes comparable in quality to those authored by physicians.
arXiv Detail & Related papers (2024-04-25T15:34:53Z) - A Dataset and Benchmark for Hospital Course Summarization with Adapted Large Language Models [4.091402760759184]
Large language models (LLMs) depict remarkable capabilities in automating real-world tasks, but their capabilities for healthcare applications have not been shown.
We introduce a novel pre-processed dataset, the MIMIC-IV-BHC, encapsulating clinical note and brief hospital course (BHC) pairs to adapt LLMs for BHC.
Using clinical notes as input, we apply prompting-based (using in-context learning) and fine-tuning-based adaptation strategies to three open-source LLMs and two proprietary LLMs.
arXiv Detail & Related papers (2024-03-08T23:17:55Z) - FOFO: A Benchmark to Evaluate LLMs' Format-Following Capability [70.84333325049123]
FoFo is a pioneering benchmark for evaluating large language models' (LLMs) ability to follow complex, domain-specific formats.
This paper presents FoFo, a pioneering benchmark for evaluating large language models' (LLMs) ability to follow complex, domain-specific formats.
arXiv Detail & Related papers (2024-02-28T19:23:27Z) - A comparative study of zero-shot inference with large language models
and supervised modeling in breast cancer pathology classification [1.4715634464004446]
Large language models (LLMs) have demonstrated promising transfer learning capability.
LLMs demonstrated the potential to speed up the execution of clinical NLP studies by reducing the need for curating large annotated datasets.
This may result in an increase in the utilization of NLP-based variables and outcomes in observational clinical studies.
arXiv Detail & Related papers (2024-01-25T02:05:31Z) - Distilling Large Language Models for Matching Patients to Clinical
Trials [3.4068841624198942]
The recent success of large language models (LLMs) has paved the way for their adoption in the high-stakes domain of healthcare.
This study presents the first systematic examination of the efficacy of both proprietary (GPT-3.5, and GPT-4) and open-source LLMs (LLAMA 7B,13B, and 70B) for the task of patient-trial matching.
Our findings reveal that open-source LLMs, when fine-tuned on this limited and synthetic dataset, demonstrate performance parity with their proprietary counterparts.
arXiv Detail & Related papers (2023-12-15T17:11:07Z) - Prompting GPT-3 To Be Reliable [117.23966502293796]
This work decomposes reliability into four facets: generalizability, fairness, calibration, and factuality.
We find that GPT-3 outperforms smaller-scale supervised models by large margins on all these facets.
arXiv Detail & Related papers (2022-10-17T14:52:39Z) - Sparse Conditional Hidden Markov Model for Weakly Supervised Named
Entity Recognition [68.68300358332156]
We propose the sparse conditional hidden Markov model (Sparse-CHMM) to evaluate noisy labeling functions.
Sparse-CHMM is optimized through unsupervised learning with a three-stage training pipeline.
It achieves a 3.01 average F1 score improvement on five comprehensive datasets.
arXiv Detail & Related papers (2022-05-27T20:47:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.