Related papers: Reshaping Free-Text Radiology Notes Into Structured Reports With Generative Transformers

Reshaping Free-Text Radiology Notes Into Structured Reports With Generative Transformers

URL: http://arxiv.org/abs/2403.18938v1
Date: Wed, 27 Mar 2024 18:38:39 GMT
Title: Reshaping Free-Text Radiology Notes Into Structured Reports With Generative Transformers
Authors: Laura Bergomi, Tommaso M. Buonocore, Paolo Antonazzo, Lorenzo Alberghi, Riccardo Bellazzi, Lorenzo Preda, Chandra Bortolotto, Enea Parimbelli,
Abstract summary: structured reporting (SR) has been recommended by various medical societies. We propose a pipeline to extract information from free-text reports. Our work aims to leverage the potential of Natural Language Processing (NLP) and Transformer-based models.
Score: 0.29530625605275984
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: BACKGROUND: Radiology reports are typically written in a free-text format, making clinical information difficult to extract and use. Recently the adoption of structured reporting (SR) has been recommended by various medical societies thanks to the advantages it offers, e.g. standardization, completeness and information retrieval. We propose a pipeline to extract information from free-text radiology reports, that fits with the items of the reference SR registry proposed by a national society of interventional and medical radiology, focusing on CT staging of patients with lymphoma. METHODS: Our work aims to leverage the potential of Natural Language Processing (NLP) and Transformer-based models to deal with automatic SR registry filling. With the availability of 174 radiology reports, we investigate a rule-free generative Question Answering approach based on a domain-specific version of T5 (IT5). Two strategies (batch-truncation and ex-post combination) are implemented to comply with the model's context length limitations. Performance is evaluated in terms of strict accuracy, F1, and format accuracy, and compared with the widely used GPT-3.5 Large Language Model. A 5-point Likert scale questionnaire is used to collect human-expert feedback on the similarity between medical annotations and generated answers. RESULTS: The combination of fine-tuning and batch splitting allows IT5 to achieve notable results; it performs on par with GPT-3.5 albeit its size being a thousand times smaller in terms of parameters. Human-based assessment scores show a high correlation (Spearman's correlation coefficients>0.88, p-values<0.001) with AI performance metrics (F1) and confirm the superior ability of LLMs (i.e., GPT-3.5, 175B of parameters) in generating plausible human-like statements.

Related papers

Leveraging large language models for structured information extraction from pathology reports [0.0]
We evaluate large language models' accuracy in extracting structured information from breast cancer histopathology reports. Open-source tool for structured information extraction can be customized by non-programmers using natural language.
arXiv Detail & Related papers (2025-02-14T21:46:02Z)
Language Models and Retrieval Augmented Generation for Automated Structured Data Extraction from Diagnostic Reports [2.932283627137903]
The study utilized two datasets: 7,294 radiology reports annotated for Brain Tumor Reporting and Data System (BT-RADS) scores and 2,154 pathology reports for isocitrate dehydrogenase (IDH) mutation status.
arXiv Detail & Related papers (2024-09-15T15:21:45Z)
RaTEScore: A Metric for Radiology Report Generation [59.37561810438641]
This paper introduces a novel, entity-aware metric, as Radiological Report (Text) Evaluation (RaTEScore) RaTEScore emphasizes crucial medical entities such as diagnostic outcomes and anatomical details, and is robust against complex medical synonyms and sensitive to negation expressions. Our evaluations demonstrate that RaTEScore aligns more closely with human preference than existing metrics, validated both on established public benchmarks and our newly proposed RaTE-Eval benchmark.
arXiv Detail & Related papers (2024-06-24T17:49:28Z)
The current status of large language models in summarizing radiology report impressions [13.402769727597812]
The effectiveness of large language models (LLMs) in summarizing radiology report impressions remains unclear. Three types of radiology reports, i.e., CT, PET-CT, and Ultrasound reports, are collected from Peking University Cancer Hospital and Institute. We use the report findings to construct the zero-shot, one-shot, and three-shot prompts with complete example reports to generate the impressions.
arXiv Detail & Related papers (2024-06-04T09:23:30Z)
Optimal path for Biomedical Text Summarization Using Pointer GPT [21.919661430250798]
GPT models have a tendency to generate factual errors, lack context, and oversimplify words. To address these limitations, we replaced the attention mechanism in the GPT model with a pointer network. The effectiveness of the Pointer-GPT model was evaluated using the ROUGE score.
arXiv Detail & Related papers (2024-03-22T02:13:23Z)
Leveraging Professional Radiologists' Expertise to Enhance LLMs' Evaluation for Radiology Reports [22.599250713630333]
Our proposed method synergizes the expertise of professional radiologists with Large Language Models (LLMs) Our approach aligns LLM evaluations with radiologist standards, enabling detailed comparisons between human and AI generated reports. Experimental results show that our "Detailed GPT-4 (5-shot)" model achieves a 0.48 score, outperforming the METEOR metric by 0.19.
arXiv Detail & Related papers (2024-01-29T21:24:43Z)
ChatRadio-Valuer: A Chat Large Language Model for Generalizable Radiology Report Generation Based on Multi-institution and Multi-system Data [115.0747462486285]
ChatRadio-Valuer is a tailored model for automatic radiology report generation that learns generalizable representations. The clinical dataset utilized in this study encompasses a remarkable total of textbf332,673 observations. ChatRadio-Valuer consistently outperforms state-of-the-art models, especially ChatGPT (GPT-3.5-Turbo) and GPT-4 et al.
arXiv Detail & Related papers (2023-10-08T17:23:17Z)
PathLDM: Text conditioned Latent Diffusion Model for Histopathology [62.970593674481414]
We introduce PathLDM, the first text-conditioned Latent Diffusion Model tailored for generating high-quality histopathology images. Our approach fuses image and textual data to enhance the generation process. We achieved a SoTA FID score of 7.64 for text-to-image generation on the TCGA-BRCA dataset, significantly outperforming the closest text-conditioned competitor with FID 30.1.
arXiv Detail & Related papers (2023-09-01T22:08:32Z)
Radiology-Llama2: Best-in-Class Large Language Model for Radiology [71.27700230067168]
This paper introduces Radiology-Llama2, a large language model specialized for radiology through a process known as instruction tuning. Quantitative evaluations using ROUGE metrics on the MIMIC-CXR and OpenI datasets demonstrate that Radiology-Llama2 achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-08-29T17:44:28Z)
Retrieval Augmented Chest X-Ray Report Generation using OpenAI GPT models [0.9339914898177185]
RAG is an approach for automated radiology report writing that leverages multimodally aligned embeddings from a contrastively pretrained vision language model. Our approach achieves better clinical metrics with a BERTScore of 0.2865 (Delta+ 25.88%) and Semb score of 0.4026 (Delta+ 6.31%)
arXiv Detail & Related papers (2023-05-05T16:28:03Z)
An Iterative Optimizing Framework for Radiology Report Summarization with ChatGPT [80.33783969507458]
The 'Impression' section of a radiology report is a critical basis for communication between radiologists and other physicians. Recent studies have achieved promising results in automatic impression generation using large-scale medical text data. These models often require substantial amounts of medical text data and have poor generalization performance.
arXiv Detail & Related papers (2023-04-17T17:13:42Z)
DeepEnroll: Patient-Trial Matching with Deep Embedding and Entailment Prediction [67.91606509226132]
Clinical trials are essential for drug development but often suffer from expensive, inaccurate and insufficient patient recruitment. DeepEnroll is a cross-modal inference learning model to jointly encode enrollment criteria (tabular data) into a shared latent space for matching inference.
arXiv Detail & Related papers (2020-01-22T17:51:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.