Exploring the Capabilities and Limitations of Large Language Models for Radiation Oncology Decision Support
- URL: http://arxiv.org/abs/2501.02346v1
- Date: Sat, 04 Jan 2025 17:57:33 GMT
- Title: Exploring the Capabilities and Limitations of Large Language Models for Radiation Oncology Decision Support
- Authors: Florian Putz, Marlen Haderleina, Sebastian Lettmaier, Sabine Semrau, Rainer Fietkau, Yixing Huang,
- Abstract summary: An attempt to assess GPT-4's performance in radiation oncology was made via a dedicated 100-question examination.<n>GPT-4's performance on a broader field of clinical radiation oncology is benchmarked by the ACR Radiation Oncology In-Training (TXIT) exam.<n>Its performance on re-labelling structure names in accordance with the AAPM TG-263 report has also been benchmarked, achieving above 96% accuracies.
- Score: 1.592751576537053
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Thanks to the rapidly evolving integration of LLMs into decision-support tools, a significant transformation is happening across large-scale systems. Like other medical fields, the use of LLMs such as GPT-4 is gaining increasing interest in radiation oncology as well. An attempt to assess GPT-4's performance in radiation oncology was made via a dedicated 100-question examination on the highly specialized topic of radiation oncology physics, revealing GPT-4's superiority over other LLMs. GPT-4's performance on a broader field of clinical radiation oncology is further benchmarked by the ACR Radiation Oncology In-Training (TXIT) exam where GPT-4 achieved a high accuracy of 74.57%. Its performance on re-labelling structure names in accordance with the AAPM TG-263 report has also been benchmarked, achieving above 96% accuracies. Such studies shed light on the potential of LLMs in radiation oncology. As interest in the potential and constraints of LLMs in general healthcare applications continues to rise5, the capabilities and limitations of LLMs in radiation oncology decision support have not yet been fully explored.
Related papers
- A Language Vision Model Approach for Automated Tumor Contouring in Radiation Oncology [12.872083704552258]
Lung cancer ranks as the leading cause of cancer-related mortality worldwide.
The Oncology Contouring Copilot system is developed to leverage oncologist expertise for precise tumor contouring.
arXiv Detail & Related papers (2025-03-19T06:41:37Z) - How Well Can Modern LLMs Act as Agent Cores in Radiology Environments? [54.36730060680139]
RadA-BenchPlat is an evaluation platform that benchmarks the performance of large language models (LLMs) in radiology environments.
The platform also defines ten categories of tools for agent-driven task solving and evaluates seven leading LLMs.
arXiv Detail & Related papers (2024-12-12T18:20:16Z) - BURExtract-Llama: An LLM for Clinical Concept Extraction in Breast Ultrasound Reports [9.739220217225435]
This study presents a pipeline for developing an in-house LLM to extract clinical information from radiology reports.
We first use GPT-4 to create a small labeled dataset, then fine-tune a Llama3-8B model on it.
Our findings demonstrate the feasibility of developing an in-house LLM that not only matches GPT-4's performance but also offers cost reductions and enhanced data privacy.
arXiv Detail & Related papers (2024-08-21T04:33:05Z) - MGH Radiology Llama: A Llama 3 70B Model for Radiology [50.42811030970618]
This paper presents an advanced radiology-focused large language model: MGH Radiology Llama.<n>It is developed using the Llama 3 70B model, building upon previous domain-specific models like Radiology-GPT and Radiology-Llama2.<n>Our evaluation, incorporating both traditional metrics and a GPT-4-based assessment, highlights the enhanced performance of this work over general-purpose LLMs.
arXiv Detail & Related papers (2024-08-13T01:30:03Z) - GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI [67.09501109871351]
Large Vision-Language Models (LVLMs) are capable of handling diverse data types such as imaging, text, and physiological signals.
GMAI-MMBench is the most comprehensive general medical AI benchmark with well-categorized data structure and multi-perceptual granularity to date.
It is constructed from 284 datasets across 38 medical image modalities, 18 clinical-related tasks, 18 departments, and 4 perceptual granularities in a Visual Question Answering (VQA) format.
arXiv Detail & Related papers (2024-08-06T17:59:21Z) - RadioRAG: Factual large language models for enhanced diagnostics in radiology using online retrieval augmented generation [1.7618750189510493]
Large language models (LLMs) often generate outdated or inaccurate information based on static training datasets.
We have developed Radiology RAG (RadioRAG), an end-to-end framework that retrieves data from authoritative radiologic online sources in real-time.
arXiv Detail & Related papers (2024-07-22T13:29:56Z) - End-to-End Breast Cancer Radiotherapy Planning via LMMs with Consistency Embedding [47.360760580820966]
We present RO-LMM, a comprehensive large multimodal model (LMM) tailored for the field of radiation oncology.
This model effectively manages a series of tasks within the clinical workflow, including clinical context summarization, radiation treatment plan suggestion, and plan-guided target volume segmentation.
We present a novel Consistency Embedding Fine-Tuning (CEFTune) technique, which boosts LMM's robustness to noisy inputs while preserving the consistency of handling clean inputs.
arXiv Detail & Related papers (2023-11-27T14:49:06Z) - Exploring the Boundaries of GPT-4 in Radiology [46.30976153809968]
GPT-4 has a sufficient level of radiology knowledge with only occasional errors in complex context.
For findings summarisation, GPT-4 outputs are found to be overall comparable with existing manually-written impressions.
arXiv Detail & Related papers (2023-10-23T05:13:03Z) - ChatRadio-Valuer: A Chat Large Language Model for Generalizable
Radiology Report Generation Based on Multi-institution and Multi-system Data [115.0747462486285]
ChatRadio-Valuer is a tailored model for automatic radiology report generation that learns generalizable representations.
The clinical dataset utilized in this study encompasses a remarkable total of textbf332,673 observations.
ChatRadio-Valuer consistently outperforms state-of-the-art models, especially ChatGPT (GPT-3.5-Turbo) and GPT-4 et al.
arXiv Detail & Related papers (2023-10-08T17:23:17Z) - Radiology-Llama2: Best-in-Class Large Language Model for Radiology [71.27700230067168]
This paper introduces Radiology-Llama2, a large language model specialized for radiology through a process known as instruction tuning.
Quantitative evaluations using ROUGE metrics on the MIMIC-CXR and OpenI datasets demonstrate that Radiology-Llama2 achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-08-29T17:44:28Z) - Benchmarking ChatGPT-4 on ACR Radiation Oncology In-Training (TXIT) Exam
and Red Journal Gray Zone Cases: Potentials and Challenges for AI-Assisted
Medical Education and Decision Making in Radiation Oncology [7.094683738932199]
We evaluate the performance of ChatGPT-4 in radiation oncology using the 38th American College of Radiology (ACR) radiation oncology in-training (TXIT) exam and the 2022 Red Journal Gray Zone cases.
For the TXIT exam, ChatGPT-3.5 and ChatGPT-4 have achieved the scores of 63.65% and 74.57%, respectively.
ChatGPT-4 performs better in diagnosis, prognosis, and toxicity than brachytherapy and dosimetry.
arXiv Detail & Related papers (2023-04-24T09:50:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.