Related papers: Language Models and Retrieval Augmented Generation for Automated Structured Data Extraction from Diagnostic Reports

Language Models and Retrieval Augmented Generation for Automated Structured Data Extraction from Diagnostic Reports

URL: http://arxiv.org/abs/2409.10576v2
Date: Wed, 18 Sep 2024 13:27:43 GMT
Title: Language Models and Retrieval Augmented Generation for Automated Structured Data Extraction from Diagnostic Reports
Authors: Mohamed Sobhi Jabal, Pranav Warman, Jikai Zhang, Kartikeye Gupta, Ayush Jain, Maciej Mazurowski, Walter Wiggins, Kirti Magudia, Evan Calabrese,
Abstract summary: The study utilized two datasets: 7,294 radiology reports annotated for Brain Tumor Reporting and Data System (BT-RADS) scores and 2,154 pathology reports for isocitrate dehydrogenase (IDH) mutation status.
Score: 2.932283627137903
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Purpose: To develop and evaluate an automated system for extracting structured clinical information from unstructured radiology and pathology reports using open-weights large language models (LMs) and retrieval augmented generation (RAG), and to assess the effects of model configuration variables on extraction performance. Methods and Materials: The study utilized two datasets: 7,294 radiology reports annotated for Brain Tumor Reporting and Data System (BT-RADS) scores and 2,154 pathology reports annotated for isocitrate dehydrogenase (IDH) mutation status. An automated pipeline was developed to benchmark the performance of various LMs and RAG configurations. The impact of model size, quantization, prompting strategies, output formatting, and inference parameters was systematically evaluated. Results: The best performing models achieved over 98% accuracy in extracting BT-RADS scores from radiology reports and over 90% for IDH mutation status extraction from pathology reports. The top model being medical fine-tuned llama3. Larger, newer, and domain fine-tuned models consistently outperformed older and smaller models. Model quantization had minimal impact on performance. Few-shot prompting significantly improved accuracy. RAG improved performance for complex pathology reports but not for shorter radiology reports. Conclusions: Open LMs demonstrate significant potential for automated extraction of structured clinical data from unstructured clinical reports with local privacy-preserving application. Careful model selection, prompt engineering, and semi-automated optimization using annotated data are critical for optimal performance. These approaches could be reliable enough for practical use in research workflows, highlighting the potential for human-machine collaboration in healthcare data extraction.

Related papers

From Generative Modeling to Clinical Classification: A GPT-Based Architecture for EHR Notes [0.0]
This study presents a GPT-based architecture for clinical text classification.<n>Rather than updating all model parameters, the majority of the GPT-2 backbone is frozen.<n>The proposed method is evaluated on radiology reports from the MIMIC-IV-Note dataset.
arXiv Detail & Related papers (2026-01-29T16:33:47Z)
Automated interictal epileptic spike detection from simple and noisy annotations in MEG data [0.4737912324017801]
Magnetoencephalography (MEG) has been shown to be an effective exam to inform the localization of the epileptogenic zone.<n>Current automated methods are unsuitable for clinical practice.<n>In this work, we demonstrate that deep learning models can be used for detecting interictal spikes in MEG recordings.
arXiv Detail & Related papers (2025-10-24T16:02:05Z)
A Robust BERT-Based Deep Learning Model for Automated Cancer Type Extraction from Unstructured Pathology Reports [1.2546979106262524]
Fine-tuning domain-specific models for precision tasks in oncology may pave the way for more efficient and accurate clinical information extraction.<n>This model significantly outperformed the baseline model and a Large Language Model, Mistral 7B, achieving FBertscore 0.98 and overall exact match of 80.61%.
arXiv Detail & Related papers (2025-08-21T01:12:39Z)
Examining Deployment and Refinement of the VIOLA-AI Intracranial Hemorrhage Model Using an Interactive NeoMedSys Platform [0.6582858408923039]
The current study describes a radiology software platform called NeoMedSys that can enable efficient deployment and refinements of AI models.<n>We evaluated the feasibility and effectiveness of running NeoMedSys for three months in real-world clinical settings.
arXiv Detail & Related papers (2025-05-14T13:33:38Z)
Leveraging large language models for structured information extraction from pathology reports [0.0]
We evaluate large language models' accuracy in extracting structured information from breast cancer histopathology reports. Open-source tool for structured information extraction can be customized by non-programmers using natural language.
arXiv Detail & Related papers (2025-02-14T21:46:02Z)
Zero-shot generation of synthetic neurosurgical data with large language models [0.7373617024876725]
This study aims to evaluate the capability of zero-shot generation of synthetic neurosurgical data with a large language model (LLM), GPT-4o. Data synthesized with GPT-4o can effectively augment clinical data with small sample sizes, and train ML models for prediction of neurosurgical outcomes.
arXiv Detail & Related papers (2025-02-13T18:21:15Z)
Combining Domain-Specific Models and LLMs for Automated Disease Phenotyping from Survey Data [0.0]
This pilot study investigated the potential of combining a domain-specific model, BERN2, with large language models (LLMs) to enhance automated phenotyping from research survey data. We employed BERN2, a named entity recognition and normalization model, to extract information from the ORIGINS survey data. BERN2 demonstrated high performance in extracting and normalizing disease mentions, and the integration of LLMs, particularly with Few Shot Inference and RAG orchestration, further improved accuracy.
arXiv Detail & Related papers (2024-10-28T02:55:03Z)
Enhanced Electronic Health Records Text Summarization Using Large Language Models [0.0]
This project builds on prior work by creating a system that generates clinician-preferred, focused summaries. The proposed system leverages the Flan-T5 model to generate tailored EHR summaries based on clinician-specified topics.
arXiv Detail & Related papers (2024-10-12T19:36:41Z)
MGH Radiology Llama: A Llama 3 70B Model for Radiology [50.42811030970618]
This paper presents an advanced radiology-focused large language model: MGH Radiology Llama.<n>It is developed using the Llama 3 70B model, building upon previous domain-specific models like Radiology-GPT and Radiology-Llama2.<n>Our evaluation, incorporating both traditional metrics and a GPT-4-based assessment, highlights the enhanced performance of this work over general-purpose LLMs.
arXiv Detail & Related papers (2024-08-13T01:30:03Z)
Machine Learning for ALSFRS-R Score Prediction: Making Sense of the Sensor Data [44.99833362998488]
Amyotrophic Lateral Sclerosis (ALS) is a rapidly progressive neurodegenerative disease that presents individuals with limited treatment options. The present investigation, spearheaded by the iDPP@CLEF 2024 challenge, focuses on utilizing sensor-derived data obtained through an app.
arXiv Detail & Related papers (2024-07-10T19:17:23Z)
LLM-RadJudge: Achieving Radiologist-Level Evaluation for X-Ray Report Generation [37.20505633019773]
evaluating generated radiology reports is crucial for the development of radiology AI. This study proposes a novel evaluation framework using large language models (LLMs) to compare radiology reports for assessment.
arXiv Detail & Related papers (2024-04-01T09:02:12Z)
ChatRadio-Valuer: A Chat Large Language Model for Generalizable Radiology Report Generation Based on Multi-institution and Multi-system Data [115.0747462486285]
ChatRadio-Valuer is a tailored model for automatic radiology report generation that learns generalizable representations. The clinical dataset utilized in this study encompasses a remarkable total of textbf332,673 observations. ChatRadio-Valuer consistently outperforms state-of-the-art models, especially ChatGPT (GPT-3.5-Turbo) and GPT-4 et al.
arXiv Detail & Related papers (2023-10-08T17:23:17Z)
PathLDM: Text conditioned Latent Diffusion Model for Histopathology [62.970593674481414]
We introduce PathLDM, the first text-conditioned Latent Diffusion Model tailored for generating high-quality histopathology images. Our approach fuses image and textual data to enhance the generation process. We achieved a SoTA FID score of 7.64 for text-to-image generation on the TCGA-BRCA dataset, significantly outperforming the closest text-conditioned competitor with FID 30.1.
arXiv Detail & Related papers (2023-09-01T22:08:32Z)
An Iterative Optimizing Framework for Radiology Report Summarization with ChatGPT [80.33783969507458]
The 'Impression' section of a radiology report is a critical basis for communication between radiologists and other physicians. Recent studies have achieved promising results in automatic impression generation using large-scale medical text data. These models often require substantial amounts of medical text data and have poor generalization performance.
arXiv Detail & Related papers (2023-04-17T17:13:42Z)
Event-based clinical findings extraction from radiology reports with pre-trained language model [0.22940141855172028]
We present a new corpus of radiology reports annotated with clinical findings. The gold standard corpus contained a total of 500 annotated computed tomography (CT) reports. We extracted triggers and argument entities using two state-of-the-art deep learning architectures, including BERT.
arXiv Detail & Related papers (2021-12-27T05:03:10Z)
Adversarial Sample Enhanced Domain Adaptation: A Case Study on Predictive Modeling with Electronic Health Records [57.75125067744978]
We propose a data augmentation method to facilitate domain adaptation. adversarially generated samples are used during domain adaptation. Results confirm the effectiveness of our method and the generality on different tasks.
arXiv Detail & Related papers (2021-01-13T03:20:20Z)
Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients. We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks. Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z)
Hemogram Data as a Tool for Decision-making in COVID-19 Management: Applications to Resource Scarcity Scenarios [62.997667081978825]
COVID-19 pandemics has challenged emergency response systems worldwide, with widespread reports of essential services breakdown and collapse of health care structure. This work describes a machine learning model derived from hemogram exam data performed in symptomatic patients. Proposed models can predict COVID-19 qRT-PCR results in symptomatic individuals with high accuracy, sensitivity and specificity.
arXiv Detail & Related papers (2020-05-10T01:45:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.