Related papers: RadioRAG: Factual Large Language Models for Enhanced Diagnostics in Radiology Using Dynamic Retrieval Augmented Generation

RadioRAG: Factual Large Language Models for Enhanced Diagnostics in Radiology Using Dynamic Retrieval Augmented Generation

URL: http://arxiv.org/abs/2407.15621v1
Date: Mon, 22 Jul 2024 13:29:56 GMT
Title: RadioRAG: Factual Large Language Models for Enhanced Diagnostics in Radiology Using Dynamic Retrieval Augmented Generation
Authors: Soroosh Tayebi Arasteh, Mahshad Lotfinia, Keno Bressem, Robert Siepmann, Dyke Ferber, Christiane Kuhl, Jakob Nikolas Kather, Sven Nebelung, Daniel Truhn,
Abstract summary: Large language models (LLMs) have advanced the field of artificial intelligence (AI) in medicine. LLMs often generate outdated or inaccurate information based on static training datasets. We have developed Radiology RAG (RadioRAG) as an end-to-end framework that retrieves data from authoritative radiologic online sources in real-time.
Score: 1.8204982093237623
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) have advanced the field of artificial intelligence (AI) in medicine. However LLMs often generate outdated or inaccurate information based on static training datasets. Retrieval augmented generation (RAG) mitigates this by integrating outside data sources. While previous RAG systems used pre-assembled, fixed databases with limited flexibility, we have developed Radiology RAG (RadioRAG) as an end-to-end framework that retrieves data from authoritative radiologic online sources in real-time. RadioRAG is evaluated using a dedicated radiologic question-and-answer dataset (RadioQA). We evaluate the diagnostic accuracy of various LLMs when answering radiology-specific questions with and without access to additional online information via RAG. Using 80 questions from RSNA Case Collection across radiologic subspecialties and 24 additional expert-curated questions, for which the correct gold-standard answers were available, LLMs (GPT-3.5-turbo, GPT-4, Mistral-7B, Mixtral-8x7B, and Llama3 [8B and 70B]) were prompted with and without RadioRAG. RadioRAG retrieved context-specific information from www.radiopaedia.org in real-time and incorporated them into its reply. RadioRAG consistently improved diagnostic accuracy across all LLMs, with relative improvements ranging from 2% to 54%. It matched or exceeded question answering without RAG across radiologic subspecialties, particularly in breast imaging and emergency radiology. However, degree of improvement varied among models; GPT-3.5-turbo and Mixtral-8x7B-instruct-v0.1 saw notable gains, while Mistral-7B-instruct-v0.2 showed no improvement, highlighting variability in its effectiveness. LLMs benefit when provided access to domain-specific data beyond their training data. For radiology, RadioRAG establishes a robust framework that substantially improves diagnostic accuracy and factuality in radiological question answering.

Related papers

Agentic large language models improve retrieval-based radiology question answering [4.340742745938289]
Agentic retrieval significantly improved mean diagnostic accuracy over zero-shot prompting.<n>The greatest gains occurred in midsized models.<n>Even clinically fine-tuned models exhibited meaningful improvements.
arXiv Detail & Related papers (2025-08-01T16:18:52Z)
TrackRAD2025 challenge dataset: Real-time tumor tracking for MRI-guided radiotherapy [2.45070347370137]
The dataset consists of sagittal 2D cine MRIs in 585 patients from six centers.<n>By enabling more accurate motion management and adaptive treatment strategies, this dataset has the potential to advance the field of radiotherapy significantly.
arXiv Detail & Related papers (2025-03-24T20:14:42Z)
LUND-PROBE -- LUND Prostate Radiotherapy Open Benchmarking and Evaluation dataset [0.0]
A publicly available clinical dataset is presented, comprising MRI- and synthetic CT (sCT) images, target and OARs segmentations, and dose radiotherapy for 432 prostate cancer patients treated with MRI-guided radiotherapy.<n>An extended dataset with 35 patients is also included, with the addition of deep learning (DL)-generated segmentations, DL segmentation uncertainty maps, and DL segmentations manually adjusted by four radiation oncologists.<n>The publication aims to aid research within the fields of automated radiotherapy treatment planning, segmentation, inter-observer analyses, and DL model uncertainty investigation.
arXiv Detail & Related papers (2025-02-06T20:44:42Z)
MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models [49.765466293296186]
Recent progress in Medical Large Vision-Language Models (Med-LVLMs) has opened up new possibilities for interactive diagnostic tools. Med-LVLMs often suffer from factual hallucination, which can lead to incorrect diagnoses. We propose a versatile multimodal RAG system, MMed-RAG, designed to enhance the factuality of Med-LVLMs.
arXiv Detail & Related papers (2024-10-16T23:03:27Z)
Preference Fine-Tuning for Factuality in Chest X-Ray Interpretation Models Without Human Feedback [10.826651024680169]
Radiologists play a crucial role by translating medical images into medical reports. While automated approaches using vision-language models (VLMs) show promise as assistants, they require exceptionally high accuracy. We propose a scalable automated preference alignment technique for VLMs in radiology, focusing on chest X-ray (CXR) report generation.
arXiv Detail & Related papers (2024-10-09T16:07:11Z)
Towards a Benchmark for Colorectal Cancer Segmentation in Endorectal Ultrasound Videos: Dataset and Model Development [59.74920439478643]
In this paper, we collect and annotated the first benchmark dataset that covers diverse ERUS scenarios. Our ERUS-10K dataset comprises 77 videos and 10,000 high-resolution annotated frames. We introduce a benchmark model for colorectal cancer segmentation, named the Adaptive Sparse-context TRansformer (ASTR)
arXiv Detail & Related papers (2024-08-19T15:04:42Z)
MGH Radiology Llama: A Llama 3 70B Model for Radiology [27.575944159578786]
This paper presents an advanced radiology-focused large language model: MGH Radiology Llama. It is developed using the Llama 3 70B model, building upon previous domain-specific models like Radiology-GPT and Radiology-Llama2. Our evaluation, incorporating both traditional metrics and a GPT-4-based assessment, highlights the enhanced performance of this work over general-purpose LLMs.
arXiv Detail & Related papers (2024-08-13T01:30:03Z)
RAD-PHI2: Instruction Tuning PHI-2 for Radiology [2.774342358600601]
Small Language Models (SLMs) have shown remarkable performance in general domain language understanding, reasoning and coding tasks. This study investigates the application of SLMs for general radiology knowledge specifically question answering related to understanding of symptoms. By fine-tuning Phi-2 on both general domain tasks and radiology-specific tasks related to chest X-ray reports, we create Rad-Phi2.
arXiv Detail & Related papers (2024-03-12T17:27:22Z)
Large Model driven Radiology Report Generation with Clinical Quality Reinforcement Learning [16.849933628738277]
Radiology report generation (RRG) has attracted significant attention due to its potential to reduce the workload of radiologists. This paper introduces a novel RRG method, textbfLM-RRG, that integrates large models (LMs) with clinical quality reinforcement learning. Experiments on the MIMIC-CXR and IU-Xray datasets demonstrate the superiority of our method over the state of the art.
arXiv Detail & Related papers (2024-03-11T13:47:11Z)
ChatRadio-Valuer: A Chat Large Language Model for Generalizable Radiology Report Generation Based on Multi-institution and Multi-system Data [115.0747462486285]
ChatRadio-Valuer is a tailored model for automatic radiology report generation that learns generalizable representations. The clinical dataset utilized in this study encompasses a remarkable total of textbf332,673 observations. ChatRadio-Valuer consistently outperforms state-of-the-art models, especially ChatGPT (GPT-3.5-Turbo) and GPT-4 et al.
arXiv Detail & Related papers (2023-10-08T17:23:17Z)
Radiology-Llama2: Best-in-Class Large Language Model for Radiology [71.27700230067168]
This paper introduces Radiology-Llama2, a large language model specialized for radiology through a process known as instruction tuning. Quantitative evaluations using ROUGE metrics on the MIMIC-CXR and OpenI datasets demonstrate that Radiology-Llama2 achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-08-29T17:44:28Z)
Radiology-GPT: A Large Language Model for Radiology [74.07944784968372]
We introduce Radiology-GPT, a large language model for radiology. It demonstrates superior performance compared to general language models such as StableLM, Dolly and LLaMA. It exhibits significant versatility in radiological diagnosis, research, and communication.
arXiv Detail & Related papers (2023-06-14T17:57:24Z)
An Iterative Optimizing Framework for Radiology Report Summarization with ChatGPT [80.33783969507458]
The 'Impression' section of a radiology report is a critical basis for communication between radiologists and other physicians. Recent studies have achieved promising results in automatic impression generation using large-scale medical text data. These models often require substantial amounts of medical text data and have poor generalization performance.
arXiv Detail & Related papers (2023-04-17T17:13:42Z)
Cross-Modal Causal Intervention for Medical Report Generation [107.76649943399168]
Radiology Report Generation (RRG) is essential for computer-aided diagnosis and medication guidance.<n> generating accurate lesion descriptions remains challenging due to spurious correlations from visual-linguistic biases.<n>We propose a two-stage framework named CrossModal Causal Representation Learning (CMCRL)<n> Experiments on IU-Xray and MIMIC-CXR show that our CMCRL pipeline significantly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-03-16T07:23:55Z)
Radiomics-Guided Global-Local Transformer for Weakly Supervised Pathology Localization in Chest X-Rays [65.88435151891369]
Radiomics-Guided Transformer (RGT) fuses textitglobal image information with textitlocal knowledge-guided radiomics information. RGT consists of an image Transformer branch, a radiomics Transformer branch, and fusion layers that aggregate image and radiomic information.
arXiv Detail & Related papers (2022-07-10T06:32:56Z)
Automated Radiological Report Generation For Chest X-Rays With Weakly-Supervised End-to-End Deep Learning [17.315387269810426]
We built a database containing more than 12,000 CXR scans and radiological reports. We developed a model based on deep convolutional neural network and recurrent network with attention mechanism. The model provides automated recognition of given scans and generation of reports.
arXiv Detail & Related papers (2020-06-18T08:12:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.