Clinical-R1: Empowering Large Language Models for Faithful and Comprehensive Reasoning with Clinical Objective Relative Policy Optimization
- URL: http://arxiv.org/abs/2512.00601v2
- Date: Wed, 03 Dec 2025 12:24:00 GMT
- Title: Clinical-R1: Empowering Large Language Models for Faithful and Comprehensive Reasoning with Clinical Objective Relative Policy Optimization
- Authors: Boyang Gu, Hongjian Zhou, Bradley Max Segal, Jinge Wu, Zeyu Cao, Hantao Zhong, Lei Clifton, Fenglin Liu, David A. Clifton,
- Abstract summary: We introduce Clinical-Objective Relative Policy Optimization (CRPO), a scalable, multi-objective, verifiable reinforcement learning method.<n>CRPO integrates rule-based and verifiable reward signals that jointly optimize accuracy, faithfulness, and comprehensiveness without relying on human annotation.
- Score: 28.610758740650407
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in large language models (LLMs) have shown strong reasoning capabilities through large-scale pretraining and post-training reinforcement learning, demonstrated by DeepSeek-R1. However, current post-training methods, such as Grouped Relative Policy Optimization (GRPO), mainly reward correctness, which is not aligned with the multi-dimensional objectives required in high-stakes fields such as medicine, where reasoning must also be faithful and comprehensive. We introduce Clinical-Objective Relative Policy Optimization (CRPO), a scalable, multi-objective, verifiable reinforcement learning method designed to align LLM post-training with clinical reasoning principles. CRPO integrates rule-based and verifiable reward signals that jointly optimize accuracy, faithfulness, and comprehensiveness without relying on human annotation. To demonstrate its effectiveness, we train Clinical-R1-3B, a 3B-parameter model for clinical reasoning. The experiments on three benchmarks demonstrate that our CRPO substantially improves reasoning on truthfulness and completeness over standard GRPO while maintaining comfortable accuracy enhancements. This framework provides a scalable pathway to align LLM reasoning with clinical objectives, enabling safer and more collaborative AI systems for healthcare while also highlighting the potential of multi-objective, verifiable RL methods in post-training scaling of LLMs for medical domains.
Related papers
- A Federated and Parameter-Efficient Framework for Large Language Model Training in Medicine [59.78991974851707]
Large language models (LLMs) have demonstrated strong performance on medical benchmarks, including question answering and diagnosis.<n>Most medical LLMs are trained on data from a single institution, which faces limitations in generalizability and safety in heterogeneous systems.<n>We introduce the model-agnostic and parameter-efficient federated learning framework for adapting LLMs to medical applications.
arXiv Detail & Related papers (2026-01-29T18:48:21Z) - MMedExpert-R1: Strengthening Multimodal Medical Reasoning via Domain-Specific Adaptation and Clinical Guideline Reinforcement [63.82954136824963]
Medical Vision-Language Models excel at perception tasks with complex clinical reasoning required in real-world scenarios.<n>We propose a novel reasoning MedVLM that addresses these challenges through domain-specific adaptation and guideline reinforcement.
arXiv Detail & Related papers (2026-01-16T02:32:07Z) - Multidimensional Rubric-oriented Reward Model Learning via Geometric Projection Reference Constraints [4.79357178898034]
We introduce MR-RML (Multidimensional-oriented Reward Model Learning) with GPRC (Geometric Projection Reference Constraints)<n>Our approach introduces three key innovations: (1) a medical standard system that embeds domain-specific guidelines throughout the training pipeline; (2) an independent multi-dimensional reward model that decomposes evaluation criteria; and (3) projection reference constraints that translate clinical cognitive logic into mathematical regularization.<n>Our method significantly boosts the performance of the base Qwen-32B model, with improvements of 45% on the full subset and 85% on the hard subset.
arXiv Detail & Related papers (2025-11-20T08:26:16Z) - MedAlign: A Synergistic Framework of Multimodal Preference Optimization and Federated Meta-Cognitive Reasoning [52.064286116035134]
We develop MedAlign, a framework to ensure visually accurate LVLM responses for Medical Visual Question Answering (Med-VQA)<n>We first propose a multimodal Direct Preference Optimization (mDPO) objective to align preference learning with visual context.<n>We then design a Retrieval-Aware Mixture-of-Experts (RA-MoE) architecture that utilizes image and text similarity to route queries to a specialized and context-augmented LVLM.
arXiv Detail & Related papers (2025-10-24T02:11:05Z) - OncoReason: Structuring Clinical Reasoning in LLMs for Robust and Interpretable Survival Prediction [2.904892426557913]
Large language models (LLMs) have shown strong performance in biomedical NLP.<n>We present a unified, multi-task learning framework that aligns autoregressive LLMs with clinical reasoning for outcome prediction.<n>Our findings underscore the importance of reasoning-aware alignment in multi-task clinical modeling.
arXiv Detail & Related papers (2025-10-20T13:35:12Z) - Fleming-R1: Toward Expert-Level Medical Reasoning via Reinforcement Learning [6.778254993886297]
We introduce Fleming-R1, a model designed for verifiable medical reasoning through three complementary innovations.<n>First, our Reasoning-Oriented Data Strategy (RODS) combines curated medical QA datasets with knowledge-graph-guided synthesis.<n>Second, we employ Chain-of-Thought (CoT) cold start to distill high-quality reasoning trajectories from teacher models.<n>Third, we implement a two-stage Reinforcement Learning from Verifiable Rewards framework.
arXiv Detail & Related papers (2025-09-18T13:35:14Z) - Are Large Language Models Dynamic Treatment Planners? An In Silico Study from a Prior Knowledge Injection Angle [3.0391297540732545]
We evaluate large language models (LLMs) as dynamic insulin dosing agents in an in silico Type 1 diabetes simulator.<n>Our results indicate that carefully designed zero-shot prompts enable smaller LLMs to achieve comparable or superior clinical performance.<n>LLMs exhibit notable limitations, such as overly aggressive insulin dosing when prompted with chain-of-thought.
arXiv Detail & Related papers (2025-08-06T13:46:02Z) - Revisiting LLM Reasoning via Information Bottleneck [57.519119962528166]
Large language models (LLMs) have recently demonstrated remarkable progress in reasoning capabilities through reinforcement learning with verifiable rewards (RLVR)<n>We present a theoretical characterization of LLM reasoning grounded in information bottleneck (IB) principle.<n>We propose IB-aware reasoning optimization (IBRO), a framework that encourages reasoning trajectories to be both informative about the final correct answer and generalizable.
arXiv Detail & Related papers (2025-07-24T13:14:25Z) - Quantifying the Reasoning Abilities of LLMs on Real-world Clinical Cases [48.87360916431396]
We introduce MedR-Bench, a benchmarking dataset of 1,453 structured patient cases, annotated with reasoning references.<n>We propose a framework encompassing three critical examination recommendation, diagnostic decision-making, and treatment planning, simulating the entire patient care journey.<n>Using this benchmark, we evaluate five state-of-the-art reasoning LLMs, including DeepSeek-R1, OpenAI-o3-mini, and Gemini-2.0-Flash Thinking, etc.
arXiv Detail & Related papers (2025-03-06T18:35:39Z) - Beyond Self-Consistency: Ensemble Reasoning Boosts Consistency and Accuracy of LLMs in Cancer Staging [0.33554367023486936]
Cancer staging status is available in clinical reports, but it requires natural language processing to extract it.
With the advance in clinical-oriented large language models, it is promising to extract such status without extensive efforts in training the algorithms.
In this study, we propose an ensemble reasoning approach with the aim of improving the consistency of the model generations.
arXiv Detail & Related papers (2024-04-19T19:34:35Z) - Variational Empowerment as Representation Learning for Goal-Based
Reinforcement Learning [114.07623388322048]
We discuss how the standard goal-conditioned RL (GCRL) is encapsulated by the objective variational empowerment.
Our work lays a novel foundation from which to evaluate, analyze, and develop representation learning techniques in goal-based RL.
arXiv Detail & Related papers (2021-06-02T18:12:26Z) - GaNDLF: A Generally Nuanced Deep Learning Framework for Scalable
End-to-End Clinical Workflows in Medical Imaging [76.38169390121057]
We present the community-driven Generally Nuanced Deep Learning Framework (GaNDLF)
GaNDLF makes the mechanism of DL development, training, and inference more stable, reproducible, interpretable, and scalable.
We demonstrate the ability of GaNDLF to analyze both radiology and histology images, with built-in support for k-fold cross-validation.
arXiv Detail & Related papers (2021-02-26T02:24:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.