MRG-R1: Reinforcement Learning for Clinically Aligned Medical Report Generation
- URL: http://arxiv.org/abs/2512.16145v1
- Date: Thu, 18 Dec 2025 03:57:55 GMT
- Title: MRG-R1: Reinforcement Learning for Clinically Aligned Medical Report Generation
- Authors: Pengyu Wang, Shuchang Ye, Usman Naseem, Jinman Kim,
- Abstract summary: We propose a semantic-driven reinforcement learning (SRL) method for medical report generation.<n>SRL encourages clinical-correctness-guided learning beyond imitation of language style.<n>We evaluate Medical Report Generation with SRL on two datasets: IU X-Ray and MIMIC-CXR.
- Score: 23.22547135801011
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Medical report generation (MRG) aims to automatically derive radiology-style reports from medical images to aid in clinical decision-making. However, existing methods often generate text that mimics the linguistic style of radiologists but fails to guarantee clinical correctness, because they are trained on token-level objectives which focus on word-choice and sentence structure rather than actual medical accuracy. We propose a semantic-driven reinforcement learning (SRL) method for medical report generation, adopted on a large vision-language model (LVLM). SRL adopts Group Relative Policy Optimization (GRPO) to encourage clinical-correctness-guided learning beyond imitation of language style. Specifically, we optimise a report-level reward: a margin-based cosine similarity (MCCS) computed between key radiological findings extracted from generated and reference reports, thereby directly aligning clinical-label agreement and improving semantic correctness. A lightweight reasoning format constraint further guides the model to generate structured "thinking report" outputs. We evaluate Medical Report Generation with Sematic-driven Reinforment Learning (MRG-R1), on two datasets: IU X-Ray and MIMIC-CXR using clinical efficacy (CE) metrics. MRG-R1 achieves state-of-the-art performance with CE-F1 51.88 on IU X-Ray and 40.39 on MIMIC-CXR. We found that the label-semantic reinforcement is better than conventional token-level supervision. These results indicate that optimizing a clinically grounded, report-level reward rather than token overlap,meaningfully improves clinical correctness. This work is a prior to explore semantic-reinforcement in supervising medical correctness in medical Large vision-language model(Med-LVLM) training.
Related papers
- Suppressing Prior-Comparison Hallucinations in Radiology Report Generation via Semantically Decoupled Latent Steering [94.37535002230504]
We develop a training-free, inference-time control framework termed Semantically Decoupled Latent Steering.<n>Our approach constructs a semantic-free intervention vector via large language model (LLM)-driven semantic decomposition.<n>We show that our approach significantly reduces the probability of historical hallucinations.
arXiv Detail & Related papers (2026-02-27T04:49:01Z) - A Federated and Parameter-Efficient Framework for Large Language Model Training in Medicine [59.78991974851707]
Large language models (LLMs) have demonstrated strong performance on medical benchmarks, including question answering and diagnosis.<n>Most medical LLMs are trained on data from a single institution, which faces limitations in generalizability and safety in heterogeneous systems.<n>We introduce the model-agnostic and parameter-efficient federated learning framework for adapting LLMs to medical applications.
arXiv Detail & Related papers (2026-01-29T18:48:21Z) - Aligning Findings with Diagnosis: A Self-Consistent Reinforcement Learning Framework for Trustworthy Radiology Reporting [37.57009831483529]
Multimodal Large Language Models (MLLMs) have shown strong potential for radiology report generation.<n>Our framework restructures generation into two distinct components: a think block for detailed findings and an answer block for structured disease labels.
arXiv Detail & Related papers (2026-01-06T14:17:44Z) - Visual Alignment of Medical Vision-Language Models for Grounded Radiology Report Generation [25.148217482604746]
We propose VALOR:Visual Alignment of Medical Vision-Language Models for Radiology Report Generation.<n>Our method introduces a reinforcement learning-based post-alignment framework utilizing Group-Relative Proximal Optimization (GRPO)<n>Experiments on multiple benchmarks demonstrate that VALOR substantially improves factual accuracy and visual grounding, achieving significant performance gains over state-of-the-art report generation methods.
arXiv Detail & Related papers (2025-12-18T05:48:21Z) - CLARIFID: Improving Radiology Report Generation by Reinforcing Clinically Accurate Impressions and Enforcing Detailed Findings [1.515687944002438]
We propose CLARIFID, a novel framework that directly optimize diagnostic correctness by mirroring the two-step workflow of experts.<n> CLARIFID learns the logical flow from Findings to Impression through section-aware pretraining.<n>We show that our method achieves superior clinical efficacy and outperforms existing baselines on both standard NLG metrics and clinically aware scores.
arXiv Detail & Related papers (2025-07-23T05:57:59Z) - Refine Medical Diagnosis Using Generation Augmented Retrieval and Clinical Practice Guidelines [16.56254046507092]
We introduce GARMLE-G, a Generation-Augmented Retrieval framework that grounds medical language model outputs in authoritative guidelines.<n>Unlike conventional Retrieval-Augmented Generation based approaches, GARMLE-G enables hallucination-free outputs by directly retrieving authoritative guideline content.<n>A prototype system for hypertension diagnosis was developed and evaluated on multiple metrics, demonstrating superior retrieval precision, semantic relevance, and clinical guideline adherence.
arXiv Detail & Related papers (2025-06-22T11:31:13Z) - Revolutionizing Radiology Workflow with Factual and Efficient CXR Report Generation [0.0]
This paper introduces CXR-PathFinder, a novel Large Language Model (LLM)-centric foundation model specifically engineered for automated chest X-ray (CXR) report generation.<n>We propose a unique training paradigm, Clinician-Guided Adrial Fine-Tuning (CGAFT), which meticulously integrates expert clinical feedback into an adversarial learning framework.<n>Our experiments demonstrate that CXR-PathFinder significantly outperforms existing state-of-the-art medical vision-language models across various quantitative metrics.
arXiv Detail & Related papers (2025-06-01T18:47:49Z) - Reason Like a Radiologist: Chain-of-Thought and Reinforcement Learning for Verifiable Report Generation [13.580272788409092]
BoxMed-RL is a groundbreaking unified training framework for generating spatially verifiable and explainable radiology reports.<n>Built on a large vision-language model, BoxMed-RL revolutionizes report generation through two integrated phases.<n>BoxMed-RL achieves an average 7% improvement in both METEOR and ROUGE-L metrics compared to state-of-the-art methods.
arXiv Detail & Related papers (2025-04-25T16:05:06Z) - RaTEScore: A Metric for Radiology Report Generation [59.37561810438641]
This paper introduces a novel, entity-aware metric, as Radiological Report (Text) Evaluation (RaTEScore)
RaTEScore emphasizes crucial medical entities such as diagnostic outcomes and anatomical details, and is robust against complex medical synonyms and sensitive to negation expressions.
Our evaluations demonstrate that RaTEScore aligns more closely with human preference than existing metrics, validated both on established public benchmarks and our newly proposed RaTE-Eval benchmark.
arXiv Detail & Related papers (2024-06-24T17:49:28Z) - ChatRadio-Valuer: A Chat Large Language Model for Generalizable
Radiology Report Generation Based on Multi-institution and Multi-system Data [115.0747462486285]
ChatRadio-Valuer is a tailored model for automatic radiology report generation that learns generalizable representations.
The clinical dataset utilized in this study encompasses a remarkable total of textbf332,673 observations.
ChatRadio-Valuer consistently outperforms state-of-the-art models, especially ChatGPT (GPT-3.5-Turbo) and GPT-4 et al.
arXiv Detail & Related papers (2023-10-08T17:23:17Z) - Customizing General-Purpose Foundation Models for Medical Report
Generation [64.31265734687182]
The scarcity of labelled medical image-report pairs presents great challenges in the development of deep and large-scale neural networks.
We propose customizing off-the-shelf general-purpose large-scale pre-trained models, i.e., foundation models (FMs) in computer vision and natural language processing.
arXiv Detail & Related papers (2023-06-09T03:02:36Z) - Cross-Modal Causal Intervention for Medical Report Generation [107.76649943399168]
Radiology Report Generation (RRG) is essential for computer-aided diagnosis and medication guidance.<n> generating accurate lesion descriptions remains challenging due to spurious correlations from visual-linguistic biases.<n>We propose a two-stage framework named CrossModal Causal Representation Learning (CMCRL)<n> Experiments on IU-Xray and MIMIC-CXR show that our CMCRL pipeline significantly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-03-16T07:23:55Z) - Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation [116.87918100031153]
We propose a Cross-modal clinical Graph Transformer (CGT) for ophthalmic report generation (ORG)
CGT injects clinical relation triples into the visual features as prior knowledge to drive the decoding procedure.
Experiments on the large-scale FFA-IR benchmark demonstrate that the proposed CGT is able to outperform previous benchmark methods.
arXiv Detail & Related papers (2022-06-04T13:16:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.