Random Direct Preference Optimization for Radiography Report Generation
- URL: http://arxiv.org/abs/2509.21351v1
- Date: Fri, 19 Sep 2025 10:53:45 GMT
- Title: Random Direct Preference Optimization for Radiography Report Generation
- Authors: Valentin Samokhin, Boris Shirokikh, Mikhail Goncharov, Dmitriy Umerenkov, Maksim Bobrin, Ivan Oseledets, Dmitry Dylov, Mikhail Belyaev,
- Abstract summary: Radiography Report Generation (RRG) has gained significant attention in medical image analysis.<n>Existing methods have yet to achieve the quality required for deployment in real-world clinical settings.<n>We introduce a model-agnostic framework to enhance RRG accuracy using Direct Preference Optimization (DPO)
- Score: 3.5915338392912344
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Radiography Report Generation (RRG) has gained significant attention in medical image analysis as a promising tool for alleviating the growing workload of radiologists. However, despite numerous advancements, existing methods have yet to achieve the quality required for deployment in real-world clinical settings. Meanwhile, large Visual Language Models (VLMs) have demonstrated remarkable progress in the general domain by adopting training strategies originally designed for Large Language Models (LLMs), such as alignment techniques. In this paper, we introduce a model-agnostic framework to enhance RRG accuracy using Direct Preference Optimization (DPO). Our approach leverages random contrastive sampling to construct training pairs, eliminating the need for reward models or human preference annotations. Experiments on supplementing three state-of-the-art models with our Random DPO show that our method improves clinical performance metrics by up to 5%, without requiring any additional training data.
Related papers
- X-ray Insights Unleashed: Pioneering the Enhancement of Multi-Label Long-Tail Data [86.52299247918637]
Long-tailed pulmonary anomalies in chest radiography present formidable diagnostic challenges.<n>Despite the recent strides in diffusion-based methods for enhancing the representation of tailed lesions, the paucity of rare lesion exemplars curtails the generative capabilities of these approaches.<n>We propose a novel data synthesis pipeline designed to augment tail lesions utilizing a copious supply of conventional normal X-rays.
arXiv Detail & Related papers (2025-12-24T06:14:55Z) - Model Agnostic Preference Optimization for Medical Image Segmentation [5.289507655906182]
Preference optimization offers a scalable supervision paradigm based on relative preference signals.<n>We propose MAPO (Model-A Preference Optimization), a training framework that utilizes Dropout-driven segmentation hypotheses.<n> MAPO is fully dimensionality-agnostic, supporting 2D/3D CNN and Transformer-based segmentation pipelines.
arXiv Detail & Related papers (2025-12-17T01:50:52Z) - EMRRG: Efficient Fine-Tuning Pre-trained X-ray Mamba Networks for Radiology Report Generation [16.23892817333913]
EMRRG is a novel X-ray report generation framework that fine-tunes pre-trained Mamba networks.<n>An LLM with a hybrid decoder generates the medical report, enabling end-to-end training and achieving strong results on benchmark datasets.
arXiv Detail & Related papers (2025-10-19T09:54:36Z) - Fake it till You Make it: Reward Modeling as Discriminative Prediction [49.31309674007382]
GAN-RM is an efficient reward modeling framework that eliminates manual preference annotation and explicit quality dimension engineering.<n>Our method trains the reward model through discrimination between a small set of representative, unpaired target samples.<n>Experiments demonstrate our GAN-RM's effectiveness across multiple key applications.
arXiv Detail & Related papers (2025-06-16T17:59:40Z) - Efficient Medical VIE via Reinforcement Learning [10.713109515157475]
Visual Information Extraction (VIE) converts unstructured document images into structured formats like, structured formats like, critical for medical applications like report analysis and online consultations.<n>Traditional methods rely on OCR and language models, while end-to-end multimodal models offer direct generation.<n>We base our approach on the Reinforcement Learning with Verifiable Rewards (RLVR) framework to address these challenges using only 100 annotated samples.
arXiv Detail & Related papers (2025-06-16T11:10:25Z) - Online Iterative Self-Alignment for Radiology Report Generation [10.287396040943575]
This paper proposes a novel Online Iterative Self-Alignment (OISA) method for Radiology Report Generation (RRG)<n>Our approach allows for generating varied reports tailored to specific clinical objectives, enhancing the overall performance of the RRG model iteratively.
arXiv Detail & Related papers (2025-05-17T12:31:12Z) - Evaluating Vision Language Models (VLMs) for Radiology: A Comprehensive Analysis [4.803310914375717]
This study evaluates three vision-language foundation models (RAD-DINO, CheXagent, and BiomedCLIP) on their ability to capture fine-grained imaging features for radiology tasks.<n>The models were assessed across classification, segmentation, and regression tasks for pneumothorax and cardiomegaly on chest radiographs.
arXiv Detail & Related papers (2025-04-22T17:20:34Z) - Clear Preferences Leave Traces: Reference Model-Guided Sampling for Preference Learning [59.11519451499754]
Direct Preference Optimization (DPO) has emerged as a de-facto approach for aligning language models with human preferences.<n>Recent work has shown DPO's effectiveness relies on training data quality.<n>We discover that reference model probability space naturally detects high-quality training samples.
arXiv Detail & Related papers (2025-01-25T07:21:50Z) - Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation [113.5002649181103]
Training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology.
For training, we assemble a large dataset of over 697 thousand radiology image-text pairs.
For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation.
The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications.
arXiv Detail & Related papers (2024-03-12T18:12:02Z) - Vision-Language Modelling For Radiological Imaging and Reports In The
Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space.
We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains.
Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z) - Multi-Domain Balanced Sampling Improves Out-of-Distribution
Generalization of Chest X-ray Pathology Prediction Models [67.2867506736665]
We propose an idea for out-of-distribution generalization of chest X-ray pathologies that uses a simple balanced batch sampling technique.
We observed that balanced sampling between the multiple training datasets improves the performance over baseline models trained without balancing.
arXiv Detail & Related papers (2021-12-27T15:28:01Z) - Training custom modality-specific U-Net models with weak localizations
for improved Tuberculosis segmentation and localization [0.6999740786886535]
UNet segmentation models have demonstrated superior performance compared to conventional handcrafted features.
We train custom chest X ray modality specific UNet models for semantic segmentation of Tuberculosis consistent findings.
arXiv Detail & Related papers (2021-02-21T14:03:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.