Med-R$^3$: Enhancing Medical Retrieval-Augmented Reasoning of LLMs via Progressive Reinforcement Learning
- URL: http://arxiv.org/abs/2507.23541v2
- Date: Sat, 02 Aug 2025 05:06:57 GMT
- Title: Med-R$^3$: Enhancing Medical Retrieval-Augmented Reasoning of LLMs via Progressive Reinforcement Learning
- Authors: Keer Lu, Zheng Liang, Youquan Li, Jiejun Tan, Da Pan, Shusen Zhang, Guosheng Dong, Huang Leng, Bin Cui, Wentao Zhang,
- Abstract summary: We introduce **Med-R$3$**, a **Med**ical **R**etrieval-augmented **R**easoning framework driven by progressive **R**einforcement learning.<n>In this framework, we first develop the model's ability to perform logical reasoning over medical problems.<n>We then adaptively optimize the retrieval capability to better align with the characteristics of knowledge corpus and external information utilization.
- Score: 31.58210903685538
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In medical scenarios, effectively retrieving external knowledge and leveraging it for rigorous logical reasoning is of significant importance. Despite their potential, existing work has predominantly focused on enhancing either retrieval or reasoning capabilities of the models in isolation, with little attention given to their joint optimization, which leads to limited coordination between the two processes. Additionally, current methods rely heavily on supervised fine-tuning (SFT), which can cause models to memorize existing problem-solving pathways, thereby restricting their generalization ability when confronted with novel problem contexts. Furthermore, while some studies have explored to improve retrieval-augmented reasoning in general domains via reinforcement learning, their reward function designs do not adequately capture the specific demands of the medical domain. To address these challenges, we introduce **Med-R$^3$**, a **Med**ical **R**etrieval-augmented **R**easoning framework driven by progressive **R**einforcement learning. In this framework, we first develop the model's ability to perform logical reasoning over medical problems. Subsequently, on the basis of this foundation, we adaptively optimize the retrieval capability to better align with the characteristics of knowledge corpus and external information utilization throughout the reasoning process. Finally, we conduct joint optimization of the model's retrieval and reasoning coordination. Extensive experiments indicate that **Med-R$^3$** could achieve state-of-the-art performances, with LLaMA3.1-8B-Instruct + Med-R$^3$ surpassing closed-sourced GPT-4o-mini by 3.93\% at a comparable parameter scale, while Qwen2.5-14B augmented with Med-R$^3$ shows a more substantial gain of 13.53\%.
Related papers
- Content-based 3D Image Retrieval and a ColBERT-inspired Re-ranking for Tumor Flagging and Staging [0.0]
This study advances CBIR research for volumetric medical images through three key contributions.<n>We introduce C-MIR, a novel volumetric re-ranking method adapting ColBERT's contextualized late interaction mechanism for 3D medical imaging.<n>We demonstrate the successful adaptation of the late interaction principle to volumetric medical images, enabling effective context-aware re-ranking.
arXiv Detail & Related papers (2025-07-23T11:12:52Z) - Gazal-R1: Achieving State-of-the-Art Medical Reasoning with Parameter-Efficient Two-Stage Training [0.0]
We present Gazal-R1, a 32-billion- parameter language model that achieves state-of-the-art performance in medical reasoning.<n>Our model demonstrates that strategic training can enable mid-sized models to outperform significantly larger counterparts in specialized domains.<n>Gazal-R1 achieves exceptional performance across medical benchmarks, scoring 87.1% on MedQA, 81.6% on MMLU Pro (Medical), and 79.6% on PubMedQA.
arXiv Detail & Related papers (2025-06-18T09:44:21Z) - CAPO: Reinforcing Consistent Reasoning in Medical Decision-Making [42.28216499263317]
We introduce Med-Zero-17K, a curated dataset for pure RL-based training, encompassing over 30 medical image modalities and 24 clinical tasks.<n>We propose a novel large-scale RL framework for Med-VLMs, which integrates rewards to ensure fidelity between perception and reasoning, consistency in reasoning-to-answer, and rule-based accuracy for final responses.
arXiv Detail & Related papers (2025-06-15T13:42:46Z) - Knowledge or Reasoning? A Close Look at How LLMs Think Across Domains [52.86636270242863]
This work moves beyond the final-answer accuracy and investigates step-by-step reasoning in the medical and mathematical domains.<n>We introduce a fine-grained evaluation framework that judges the correctness of knowledge used and the quality of reasoning.<n>Using this framework, we study R1-distilled and base Qwen models trained with supervised fine-tuning (SFT) and/or reinforcement learning (RL) in the medical and math domains.
arXiv Detail & Related papers (2025-06-02T18:01:00Z) - Improving Medical Reasoning with Curriculum-Aware Reinforcement Learning [2.262453679768892]
We introduce textbfMedCCO, the first multimodal reinforcement learning framework tailored for medical VQA.<n>MedCCO is fine-tuned on a diverse set of close-ended medical VQA tasks to establish domain-grounded reasoning capabilities.<n>We validate MedCCO across eight challenging medical VQA benchmarks, spanning both close-ended and open-ended settings.
arXiv Detail & Related papers (2025-05-25T16:20:55Z) - RARE: Retrieval-Augmented Reasoning Modeling [41.24577920467858]
We propose Retrieval-Augmented Reasoning Modeling (RARE), a novel paradigm that decouples knowledge storage from reasoning optimization.<n>RARE externalizes domain knowledge to retrievable sources and internalizes domain-specific reasoning patterns during training.<n>Experiments demonstrate that lightweight-trained models (e.g., Llama-3.1-8B) could achieve state-of-the-art performance, surpassing retrieval-augmented GPT-4 and DeepSeek-R1 up to approximately 20% accuracy.
arXiv Detail & Related papers (2025-03-30T16:49:44Z) - Quantifying the Reasoning Abilities of LLMs on Real-world Clinical Cases [48.87360916431396]
We introduce MedR-Bench, a benchmarking dataset of 1,453 structured patient cases, annotated with reasoning references.<n>We propose a framework encompassing three critical examination recommendation, diagnostic decision-making, and treatment planning, simulating the entire patient care journey.<n>Using this benchmark, we evaluate five state-of-the-art reasoning LLMs, including DeepSeek-R1, OpenAI-o3-mini, and Gemini-2.0-Flash Thinking, etc.
arXiv Detail & Related papers (2025-03-06T18:35:39Z) - Structured Outputs Enable General-Purpose LLMs to be Medical Experts [50.02627258858336]
Large language models (LLMs) often struggle with open-ended medical questions.<n>We propose a novel approach utilizing structured medical reasoning.<n>Our approach achieves the highest Factuality Score of 85.8, surpassing fine-tuned models.
arXiv Detail & Related papers (2025-03-05T05:24:55Z) - Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in Large Language Models [73.79091519226026]
Uncertainty of Thoughts (UoT) is an algorithm to augment large language models with the ability to actively seek information by asking effective questions.
In experiments on medical diagnosis, troubleshooting, and the 20 Questions game, UoT achieves an average performance improvement of 38.1% in the rate of successful task completion.
arXiv Detail & Related papers (2024-02-05T18:28:44Z) - Ladder-of-Thought: Using Knowledge as Steps to Elevate Stance Detection [73.31406286956535]
We introduce the Ladder-of-Thought (LoT) for the stance detection task.
LoT directs the small LMs to assimilate high-quality external knowledge, refining the intermediate rationales produced.
Our empirical evaluations underscore LoT's efficacy, marking a 16% improvement over GPT-3.5 and a 10% enhancement compared to GPT-3.5 with CoT on stance detection task.
arXiv Detail & Related papers (2023-08-31T14:31:48Z) - Pruning the Way to Reliable Policies: A Multi-Objective Deep Q-Learning Approach to Critical Care [46.2482873419289]
We introduce a deep Q-learning approach to obtain more reliable critical care policies.
We evaluate our method in off-policy and offline settings using simulated environments and real health records from intensive care units.
arXiv Detail & Related papers (2023-06-13T18:02:57Z) - Resource Planning for Hospitals Under Special Consideration of the
COVID-19 Pandemic: Optimization and Sensitivity Analysis [87.31348761201716]
Crises like the COVID-19 pandemic pose a serious challenge to health-care institutions.
BaBSim.Hospital is a tool for capacity planning based on discrete event simulation.
We aim to investigate and optimize these parameters to improve BaBSim.Hospital.
arXiv Detail & Related papers (2021-05-16T12:38:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.