Related papers: Preference Fine-Tuning for Factuality in Chest X-Ray Interpretation Models Without Human Feedback

Preference Fine-Tuning for Factuality in Chest X-Ray Interpretation Models Without Human Feedback

URL: http://arxiv.org/abs/2410.07025v1
Date: Wed, 9 Oct 2024 16:07:11 GMT
Title: Preference Fine-Tuning for Factuality in Chest X-Ray Interpretation Models Without Human Feedback
Authors: Dennis Hein, Zhihong Chen, Sophie Ostmeier, Justin Xu, Maya Varma, Eduardo Pontes Reis, Arne Edward Michalson, Christian Bluethgen, Hyun Joo Shin, Curtis Langlotz, Akshay S Chaudhari,
Abstract summary: Radiologists play a crucial role by translating medical images into medical reports. While automated approaches using vision-language models (VLMs) show promise as assistants, they require exceptionally high accuracy. We propose a scalable automated preference alignment technique for VLMs in radiology, focusing on chest X-ray (CXR) report generation.
Score: 10.826651024680169
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Radiologists play a crucial role by translating medical images into medical reports. However, the field faces staffing shortages and increasing workloads. While automated approaches using vision-language models (VLMs) show promise as assistants, they require exceptionally high accuracy. Most current VLMs in radiology rely solely on supervised fine-tuning (SFT). Meanwhile, in the general domain, additional preference fine-tuning has become standard practice. The challenge in radiology lies in the prohibitive cost of obtaining radiologist feedback. We propose a scalable automated preference alignment technique for VLMs in radiology, focusing on chest X-ray (CXR) report generation. Our method leverages publicly available datasets with an LLM-as-a-Judge mechanism, eliminating the need for additional expert radiologist feedback. We evaluate and benchmark five direct alignment algorithms (DAAs). Our results show up to a 57.4% improvement in average GREEN scores, a LLM-based metric for evaluating CXR reports, and a 9.2% increase in an average across six metrics (domain specific and general), compared to the SFT baseline. We study reward overoptimization via length exploitation, with reports lengthening by up to 3.2x. To assess a potential alignment tax, we benchmark on six additional diverse tasks, finding no significant degradations. A reader study involving four board-certified radiologists indicates win rates of up to 0.62 over the SFT baseline, while significantly penalizing verbosity. Our analysis provides actionable insights for the development of VLMs in high-stakes fields like radiology.

Related papers

FG-CXR: A Radiologist-Aligned Gaze Dataset for Enhancing Interpretability in Chest X-Ray Report Generation [9.374812942790953]
We introduce Fine-Grained CXR dataset, which provides fine-grained paired information between the captions generated by radiologists and the corresponding gaze attention heatmaps for each anatomy. Our analysis reveals that simply applying black-box image captioning methods to generate reports cannot adequately explain which information in CXR is utilized. We propose a novel explainable radiologist's attention generator network (Gen-XAI) that mimics the diagnosis process of radiologists, explicitly constraining its output to closely align with both radiologist's gaze attention and transcript.
arXiv Detail & Related papers (2024-11-23T02:22:40Z)
Towards a Benchmark for Colorectal Cancer Segmentation in Endorectal Ultrasound Videos: Dataset and Model Development [59.74920439478643]
In this paper, we collect and annotated the first benchmark dataset that covers diverse ERUS scenarios. Our ERUS-10K dataset comprises 77 videos and 10,000 high-resolution annotated frames. We introduce a benchmark model for colorectal cancer segmentation, named the Adaptive Sparse-context TRansformer (ASTR)
arXiv Detail & Related papers (2024-08-19T15:04:42Z)
RadioRAG: Factual Large Language Models for Enhanced Diagnostics in Radiology Using Dynamic Retrieval Augmented Generation [1.8204982093237623]
Large language models (LLMs) have advanced the field of artificial intelligence (AI) in medicine. LLMs often generate outdated or inaccurate information based on static training datasets. We have developed Radiology RAG (RadioRAG) as an end-to-end framework that retrieves data from authoritative radiologic online sources in real-time.
arXiv Detail & Related papers (2024-07-22T13:29:56Z)
Large Model driven Radiology Report Generation with Clinical Quality Reinforcement Learning [16.849933628738277]
Radiology report generation (RRG) has attracted significant attention due to its potential to reduce the workload of radiologists. This paper introduces a novel RRG method, textbfLM-RRG, that integrates large models (LMs) with clinical quality reinforcement learning. Experiments on the MIMIC-CXR and IU-Xray datasets demonstrate the superiority of our method over the state of the art.
arXiv Detail & Related papers (2024-03-11T13:47:11Z)
How Well Do Multi-modal LLMs Interpret CT Scans? An Auto-Evaluation Framework for Analyses [14.884877292068351]
This study introduces a novel evaluation framework, named GPTRadScore'' It assesses the capabilities of multi-modal LLMs, such as GPT-4 with Vision (GPT-4V), Gemini Pro Vision, LLaVA-Med, and RadFM, in generating descriptions for prospectively-identified findings. By employing a decomposition technique based on GPT-4, GPTRadScore compares these generated descriptions with gold-standard report sentences, analyzing their accuracy in terms of body part, location, and type of finding.
arXiv Detail & Related papers (2024-03-08T21:16:28Z)
Leveraging Professional Radiologists' Expertise to Enhance LLMs' Evaluation for Radiology Reports [22.599250713630333]
Our proposed method synergizes the expertise of professional radiologists with Large Language Models (LLMs) Our approach aligns LLM evaluations with radiologist standards, enabling detailed comparisons between human and AI generated reports. Experimental results show that our "Detailed GPT-4 (5-shot)" model achieves a 0.48 score, outperforming the METEOR metric by 0.19.
arXiv Detail & Related papers (2024-01-29T21:24:43Z)
ChatRadio-Valuer: A Chat Large Language Model for Generalizable Radiology Report Generation Based on Multi-institution and Multi-system Data [115.0747462486285]
ChatRadio-Valuer is a tailored model for automatic radiology report generation that learns generalizable representations. The clinical dataset utilized in this study encompasses a remarkable total of textbf332,673 observations. ChatRadio-Valuer consistently outperforms state-of-the-art models, especially ChatGPT (GPT-3.5-Turbo) and GPT-4 et al.
arXiv Detail & Related papers (2023-10-08T17:23:17Z)
Radiology-Llama2: Best-in-Class Large Language Model for Radiology [71.27700230067168]
This paper introduces Radiology-Llama2, a large language model specialized for radiology through a process known as instruction tuning. Quantitative evaluations using ROUGE metrics on the MIMIC-CXR and OpenI datasets demonstrate that Radiology-Llama2 achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-08-29T17:44:28Z)
Longitudinal Data and a Semantic Similarity Reward for Chest X-Ray Report Generation [7.586632627817609]
Radiologists face high burnout rates, partly due to the increasing volume of Chest X-rays (CXRs) requiring interpretation and reporting. Our proposed CXR report generator integrates elements of the workflow and introduces a novel reward for reinforcement learning. Results from our study demonstrate that the proposed model generates reports that are more aligned with radiologists' reports than state-of-the-art models.
arXiv Detail & Related papers (2023-07-19T05:41:14Z)
Generation of Radiology Findings in Chest X-Ray by Leveraging Collaborative Knowledge [6.792487817626456]
The cognitive task of interpreting medical images remains the most critical and often time-consuming step in the radiology workflow. This work focuses on reducing the workload of radiologists who spend most of their time either writing or narrating the Findings. Unlike past research, which addresses radiology report generation as a single-step image captioning task, we have further taken into consideration the complexity of interpreting CXR images.
arXiv Detail & Related papers (2023-06-18T00:51:28Z)
XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models [60.437091462613544]
We introduce XrayGPT, a novel conversational medical vision-language model. It can analyze and answer open-ended questions about chest radiographs. We generate 217k interactive and high-quality summaries from free-text radiology reports.
arXiv Detail & Related papers (2023-06-13T17:59:59Z)
Act Like a Radiologist: Radiology Report Generation across Anatomical Regions [50.13206214694885]
X-RGen is a radiologist-minded report generation framework across six anatomical regions. In X-RGen, we seek to mimic the behaviour of human radiologists, breaking them down into four principal phases. We enhance the recognition capacity of the image encoder by analysing images and reports across various regions.
arXiv Detail & Related papers (2023-05-26T07:12:35Z)
Cross-Modal Causal Intervention for Medical Report Generation [107.76649943399168]
Radiology Report Generation (RRG) is essential for computer-aided diagnosis and medication guidance.<n> generating accurate lesion descriptions remains challenging due to spurious correlations from visual-linguistic biases.<n>We propose a two-stage framework named CrossModal Causal Representation Learning (CMCRL)<n> Experiments on IU-Xray and MIMIC-CXR show that our CMCRL pipeline significantly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-03-16T07:23:55Z)
Advancing Radiograph Representation Learning with Masked Record Modeling [52.04899592688968]
We formulate the self- and report-completion as two complementary objectives and present a unified framework based on masked record modeling (MRM) MRM reconstructs masked image patches and masked report tokens following a multi-task scheme to learn knowledge-enhanced semantic representations. Specifically, we find that MRM offers superior performance in label-efficient fine-tuning.
arXiv Detail & Related papers (2023-01-30T18:33:32Z)
Improving Radiology Report Generation Systems by Removing Hallucinated References to Non-existent Priors [1.1110995501996481]
We propose two methods to remove references to priors in radiology reports. A GPT-3-based few-shot approach to rewrite medical reports without references to priors; and a BioBERT-based token classification approach to directly remove words referring to priors. We find that our re-trained model--which we call CXR-ReDonE--outperforms previous report generation methods on clinical metrics, achieving an average BERTScore of 0.2351 (2.57% absolute improvement)
arXiv Detail & Related papers (2022-09-27T00:44:41Z)
Radiomics-Guided Global-Local Transformer for Weakly Supervised Pathology Localization in Chest X-Rays [65.88435151891369]
Radiomics-Guided Transformer (RGT) fuses textitglobal image information with textitlocal knowledge-guided radiomics information. RGT consists of an image Transformer branch, a radiomics Transformer branch, and fusion layers that aggregate image and radiomic information.
arXiv Detail & Related papers (2022-07-10T06:32:56Z)
Generative Residual Attention Network for Disease Detection [51.60842580044539]
We present a novel approach for disease generation in X-rays using a conditional generative adversarial learning. We generate a corresponding radiology image in a target domain while preserving the identity of the patient. We then use the generated X-ray image in the target domain to augment our training to improve the detection performance.
arXiv Detail & Related papers (2021-10-25T14:15:57Z)
Cross-Modal Contrastive Learning for Abnormality Classification and Localization in Chest X-rays with Radiomics using a Feedback Loop [63.81818077092879]
We propose an end-to-end semi-supervised cross-modal contrastive learning framework for medical images. We first apply an image encoder to classify the chest X-rays and to generate the image features. The radiomic features are then passed through another dedicated encoder to act as the positive sample for the image features generated from the same chest X-ray.
arXiv Detail & Related papers (2021-04-11T09:16:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.