Anatomy-R1: Enhancing Anatomy Reasoning in Multimodal Large Language Models via Anatomical Similarity Curriculum and Group Diversity Augmentation
- URL: http://arxiv.org/abs/2512.19512v2
- Date: Wed, 24 Dec 2025 05:32:41 GMT
- Title: Anatomy-R1: Enhancing Anatomy Reasoning in Multimodal Large Language Models via Anatomical Similarity Curriculum and Group Diversity Augmentation
- Authors: Ziyang Song, Zelin Zang, Zuyao Chen, Xusheng Liang, Dong Yi, Jinlin Wu, Hongbin Liu, Jiebo Luo, Zhen. Lei,
- Abstract summary: Multimodal Large Language Models (MLLMs) have achieved impressive progress in natural image reasoning.<n>Their potential in medical imaging remains underexplored, especially in clinical anatomical surgical images.<n>These challenges limit the effectiveness of conventionalSupervised Fine-Tuning strategies.
- Score: 52.7583577508452
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multimodal Large Language Models (MLLMs) have achieved impressive progress in natural image reasoning, yet their potential in medical imaging remains underexplored, especially in clinical anatomical surgical images. Anatomy understanding tasks demand precise understanding and clinically coherent answers, which are difficult to achieve due to the complexity of medical data and the scarcity of high-quality expert annotations. These challenges limit the effectiveness of conventional Supervised Fine-Tuning (SFT) strategies. While recent work has demonstrated that Group Relative Policy Optimization (GRPO) can enhance reasoning in MLLMs without relying on large amounts of data, we find two weaknesses that hinder GRPO's reasoning performance in anatomy recognition: 1) knowledge cannot be effectively shared between different anatomical structures, resulting in uneven information gain and preventing the model from converging, and 2) the model quickly converges to a single reasoning path, suppressing the exploration of diverse strategies. To overcome these challenges, we propose two novel methods. First, we implement a progressive learning strategy called Anatomical Similarity Curriculum Learning by controlling question difficulty via the similarity of answer choices, enabling the model to master complex problems incrementally. Second, we utilize question augmentation referred to as Group Diversity Question Augmentation to expand the model's search space for difficult queries, mitigating the tendency to produce uniform responses. Comprehensive experiments on the SGG-VQA and OmniMedVQA benchmarks show our method achieves a significant improvement across the two benchmarks, demonstrating its effectiveness in enhancing the medical reasoning capabilities of MLLMs. The code can be found in https://github.com/tomato996/Anatomy-R1
Related papers
- MMedExpert-R1: Strengthening Multimodal Medical Reasoning via Domain-Specific Adaptation and Clinical Guideline Reinforcement [63.82954136824963]
Medical Vision-Language Models excel at perception tasks with complex clinical reasoning required in real-world scenarios.<n>We propose a novel reasoning MedVLM that addresses these challenges through domain-specific adaptation and guideline reinforcement.
arXiv Detail & Related papers (2026-01-16T02:32:07Z) - RAR$^2$: Retrieval-Augmented Medical Reasoning via Thought-Driven Retrieval [25.425621641226815]
Large Language Models (LLMs) have shown promising performance on diverse medical benchmarks.<n>RAG has emerged as a key approach for mitigating knowledge gaps and hallucinations by incorporating external medical information.<n>We propose RAR$2$, a joint learning framework that improves Reasoning-Augmented Retrieval and Retrieval-Augmented Reasoning.
arXiv Detail & Related papers (2025-09-24T05:35:57Z) - Bridging the Gap in Ophthalmic AI: MM-Retinal-Reason Dataset and OphthaReason Model toward Dynamic Multimodal Reasoning [15.73558614478585]
We introduce MM-Retinal-Reason, the first ophthalmic multimodal dataset with the full spectrum of perception and reasoning.<n>Building upon MM-Retinal-Reason, we propose OphthaReason, the first ophthalmology-specific multimodal reasoning model with step-by-step reasoning traces.<n>Our model achieves state-of-the-art performance on both basic and complex reasoning tasks.
arXiv Detail & Related papers (2025-08-22T06:47:30Z) - Discovering Pathology Rationale and Token Allocation for Efficient Multimodal Pathology Reasoning [25.707757721296627]
Multimodal pathological image understanding has garnered widespread interest due to its potential to improve diagnostic accuracy.<n>Existing methods exhibit limited reasoning capabilities, which hamper their ability to handle complex diagnostic scenarios.<n>We introduce a novel bilateral reinforcement learning framework comprising two synergistic branches.
arXiv Detail & Related papers (2025-05-21T16:03:03Z) - AOR: Anatomical Ontology-Guided Reasoning for Medical Large Multimodal Model in Chest X-Ray Interpretation [25.070570876478417]
Chest X-rays (CXRs) are the most frequently performed imaging examinations in clinical settings.<n>Recent advancements in Large Multimodal Models (LMMs) have enabled automated CXR interpretation, enhancing diagnostic accuracy and efficiency.<n>Current Medical LMMs (MLMMs) still face two major challenges: (1) Insufficient region-level understanding and interaction, and (2) Limited interpretability due to single-step reasoning.
arXiv Detail & Related papers (2025-05-05T17:57:07Z) - Structured Outputs Enable General-Purpose LLMs to be Medical Experts [50.02627258858336]
Large language models (LLMs) often struggle with open-ended medical questions.<n>We propose a novel approach utilizing structured medical reasoning.<n>Our approach achieves the highest Factuality Score of 85.8, surpassing fine-tuned models.
arXiv Detail & Related papers (2025-03-05T05:24:55Z) - O1 Replication Journey -- Part 3: Inference-time Scaling for Medical Reasoning [27.827761004918106]
This work explores the potential of inference-time scaling in large language models (LLMs) for medical reasoning tasks.<n>With a modest training set of 500 samples, our model yields substantial performance improvements of 6%-11%.
arXiv Detail & Related papers (2025-01-11T07:10:23Z) - Knowledge-Guided Prompt Learning for Lifespan Brain MR Image Segmentation [53.70131202548981]
We present a two-step segmentation framework employing Knowledge-Guided Prompt Learning (KGPL) for brain MRI.
Specifically, we first pre-train segmentation models on large-scale datasets with sub-optimal labels.
The introduction of knowledge-wise prompts captures semantic relationships between anatomical variability and biological processes.
arXiv Detail & Related papers (2024-07-31T04:32:43Z) - Robust and Interpretable Medical Image Classifiers via Concept
Bottleneck Models [49.95603725998561]
We propose a new paradigm to build robust and interpretable medical image classifiers with natural language concepts.
Specifically, we first query clinical concepts from GPT-4, then transform latent image features into explicit concepts with a vision-language model.
arXiv Detail & Related papers (2023-10-04T21:57:09Z) - Patched Diffusion Models for Unsupervised Anomaly Detection in Brain MRI [55.78588835407174]
We propose a method that reformulates the generation task of diffusion models as a patch-based estimation of healthy brain anatomy.
We evaluate our approach on data of tumors and multiple sclerosis lesions and demonstrate a relative improvement of 25.1% compared to existing baselines.
arXiv Detail & Related papers (2023-03-07T09:40:22Z) - Cross-Modality Deep Feature Learning for Brain Tumor Segmentation [158.8192041981564]
This paper proposes a novel cross-modality deep feature learning framework to segment brain tumors from the multi-modality MRI data.
The core idea is to mine rich patterns across the multi-modality data to make up for the insufficient data scale.
Comprehensive experiments are conducted on the BraTS benchmarks, which show that the proposed cross-modality deep feature learning framework can effectively improve the brain tumor segmentation performance.
arXiv Detail & Related papers (2022-01-07T07:46:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.