MedRAX: Medical Reasoning Agent for Chest X-ray
- URL: http://arxiv.org/abs/2502.02673v1
- Date: Tue, 04 Feb 2025 19:31:00 GMT
- Title: MedRAX: Medical Reasoning Agent for Chest X-ray
- Authors: Adibvafa Fallahpour, Jun Ma, Alif Munim, Hongwei Lyu, Bo Wang,
- Abstract summary: Chest X-rays (CXRs) play an integral role in driving critical decisions in disease management and patient care.<n>We present MedRAX, the first versatile AI agent that seamlessly integrates state-of-the-art CXR analysis tools and multimodal large language models into a unified framework.
- Score: 3.453950193734893
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Chest X-rays (CXRs) play an integral role in driving critical decisions in disease management and patient care. While recent innovations have led to specialized models for various CXR interpretation tasks, these solutions often operate in isolation, limiting their practical utility in clinical practice. We present MedRAX, the first versatile AI agent that seamlessly integrates state-of-the-art CXR analysis tools and multimodal large language models into a unified framework. MedRAX dynamically leverages these models to address complex medical queries without requiring additional training. To rigorously evaluate its capabilities, we introduce ChestAgentBench, a comprehensive benchmark containing 2,500 complex medical queries across 7 diverse categories. Our experiments demonstrate that MedRAX achieves state-of-the-art performance compared to both open-source and proprietary models, representing a significant step toward the practical deployment of automated CXR interpretation systems. Data and code have been publicly available at https://github.com/bowang-lab/MedRAX
Related papers
- Multimodal Causal-Driven Representation Learning for Generalizable Medical Image Segmentation [56.52520416420957]
We propose Multimodal Causal-Driven Representation Learning (MCDRL) to tackle domain generalization in medical image segmentation.<n>MCDRL consistently outperforms competing methods, yielding superior segmentation accuracy and exhibiting robust generalizability.
arXiv Detail & Related papers (2025-08-07T03:41:41Z) - CX-Mind: A Pioneering Multimodal Large Language Model for Interleaved Reasoning in Chest X-ray via Curriculum-Guided Reinforcement Learning [28.737391224748798]
We propose CX-Mind, the first generative model to achieve interleaved "think-answer" reasoning for chest X-ray (CXR) tasks.<n> CX-Mind is driven by curriculum reinforcement learning and verifiable process rewards (RL-VPR)<n>Experiments demonstrate that CX-Mind significantly outperforms existing medical and generaldomain MLLMs in visual understanding, text generation, and alignment.
arXiv Detail & Related papers (2025-07-31T05:07:18Z) - RadFabric: Agentic AI System with Reasoning Capability for Radiology [61.25593938175618]
RadFabric is a multi agent, multimodal reasoning framework that unifies visual and textual analysis for comprehensive CXR interpretation.<n>System employs specialized CXR agents for pathology detection, an Anatomical Interpretation Agent to map visual findings to precise anatomical structures, and a Reasoning Agent powered by large multimodal reasoning models to synthesize visual, anatomical, and clinical data into transparent and evidence based diagnoses.
arXiv Detail & Related papers (2025-06-17T03:10:33Z) - MMedAgent-RL: Optimizing Multi-Agent Collaboration for Multimodal Medical Reasoning [63.63542462400175]
We propose MMedAgent-RL, a reinforcement learning-based multi-agent framework that enables dynamic, optimized collaboration among medical agents.<n> Specifically, we train two GP agents based on Qwen2.5-VL via RL: the triage doctor learns to assign patients to appropriate specialties, while the attending physician integrates the judgments from multi-specialists.<n>Experiments on five medical VQA benchmarks demonstrate that MMedAgent-RL not only outperforms both open-source and proprietary Med-LVLMs, but also exhibits human-like reasoning patterns.
arXiv Detail & Related papers (2025-05-31T13:22:55Z) - Interpreting Chest X-rays Like a Radiologist: A Benchmark with Clinical Reasoning [18.15610003617933]
We present CXRTrek, a new multi-stage visual question answering (VQA) dataset for chest X-ray (CXR) interpretation.<n>The dataset is designed to explicitly simulate the diagnostic reasoning process employed by radiologists in real-world clinical settings.<n>We propose a new vision-language large model (VLLM), CXRTrekNet, specifically designed to incorporate the clinical reasoning flow into the framework.
arXiv Detail & Related papers (2025-05-29T06:30:40Z) - A Generative Framework for Bidirectional Image-Report Understanding in Chest Radiography [1.2289361708127877]
Multi-Stage Adaptive Vision-Language Tuning (MAViLT) is a novel framework designed to enhance multimodal reasoning and generation for vision-based understanding.
MAViLT incorporates a clinical gradient-weighted tokenization process and a hierarchical fine-tuning strategy, enabling it to generate accurate radiology reports, synthesize realistic CXRs from text, and answer vision-based clinical questions.
We evaluate MAViLT on two benchmark datasets, MIMIC-CXR and Indiana University CXR, achieving state-of-the-art results across all tasks.
arXiv Detail & Related papers (2025-02-09T15:02:57Z) - Can Modern LLMs Act as Agent Cores in Radiology Environments? [54.36730060680139]
Large language models (LLMs) offer enhanced accuracy and interpretability across various domains.<n>This paper aims to investigate the pre-requisite question for building concrete radiology agents.<n>We present RadABench-Data, a comprehensive synthetic evaluation dataset for LLM-based agents.<n>Second, we propose RadABench-EvalPlat, a novel evaluation platform for agents featuring a prompt-driven workflow.
arXiv Detail & Related papers (2024-12-12T18:20:16Z) - Medchain: Bridging the Gap Between LLM Agents and Clinical Practice through Interactive Sequential Benchmarking [58.25862290294702]
We present MedChain, a dataset of 12,163 clinical cases that covers five key stages of clinical workflow.<n>We also propose MedChain-Agent, an AI system that integrates a feedback mechanism and a MCase-RAG module to learn from previous cases and adapt its responses.
arXiv Detail & Related papers (2024-12-02T15:25:02Z) - Comprehensive and Practical Evaluation of Retrieval-Augmented Generation Systems for Medical Question Answering [70.44269982045415]
Retrieval-augmented generation (RAG) has emerged as a promising approach to enhance the performance of large language models (LLMs)
We introduce Medical Retrieval-Augmented Generation Benchmark (MedRGB) that provides various supplementary elements to four medical QA datasets.
Our experimental results reveals current models' limited ability to handle noise and misinformation in the retrieved documents.
arXiv Detail & Related papers (2024-11-14T06:19:18Z) - A foundation model for generalizable disease diagnosis in chest X-ray images [40.9095393430871]
We introduce CXRBase, a foundational model designed to learn versatile representations from unlabelled CXR images.
CXRBase is trained on a substantial dataset of 1.04 million unlabelled CXR images.
It is fine-tuned with labeled data to enhance its performance in disease detection.
arXiv Detail & Related papers (2024-10-11T14:41:27Z) - Advancing human-centric AI for robust X-ray analysis through holistic self-supervised learning [33.9544297423474]
We present RayDINO, a large visual encoder trained by self-supervision on 873k chest X-rays.
We compare RayDINO to previous state-of-the-art models across nine radiology tasks, from classification and dense segmentation to text generation.
Our findings suggest that self-supervision allows patient-centric AI proving useful in clinical and interpreting X-rays holistically.
arXiv Detail & Related papers (2024-05-02T16:59:10Z) - MLVICX: Multi-Level Variance-Covariance Exploration for Chest X-ray Self-Supervised Representation Learning [6.4136876268620115]
MLVICX is an approach to capture rich representations in the form of embeddings from chest X-ray images.
We demonstrate the performance of MLVICX in advancing self-supervised chest X-ray representation learning.
arXiv Detail & Related papers (2024-03-18T06:19:37Z) - Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation [113.5002649181103]
Training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology.
For training, we assemble a large dataset of over 697 thousand radiology image-text pairs.
For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation.
The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications.
arXiv Detail & Related papers (2024-03-12T18:12:02Z) - AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator [69.51568871044454]
We introduce textbfAI Hospital, a framework simulating dynamic medical interactions between emphDoctor as player and NPCs.
This setup allows for realistic assessments of LLMs in clinical scenarios.
We develop the Multi-View Medical Evaluation benchmark, utilizing high-quality Chinese medical records and NPCs.
arXiv Detail & Related papers (2024-02-15T06:46:48Z) - A Vision-Language Foundation Model to Enhance Efficiency of Chest X-ray Interpretation [22.8169684575764]
Over 1.4 billion chest X-rays (CXRs) are performed annually due to their cost-effectiveness as an initial diagnostic test.<n>This scale of radiological studies provides a significant opportunity to streamline CXR interpretation and documentation.<n>We constructed a large-scale dataset (CheXinstruct), which we utilized to train a vision-language foundation model (CheXagent)
arXiv Detail & Related papers (2024-01-22T18:51:07Z) - MedXChat: A Unified Multimodal Large Language Model Framework towards CXRs Understanding and Generation [28.497591315598402]
Multimodal Large Language Models (MLLMs) have shown success in various general image processing tasks.
This study investigates the potential of MLLMs in improving the understanding and generation of Chest X-Rays (CXRs)
arXiv Detail & Related papers (2023-12-04T06:40:12Z) - XrayGPT: Chest Radiographs Summarization using Medical Vision-Language
Models [60.437091462613544]
We introduce XrayGPT, a novel conversational medical vision-language model.
It can analyze and answer open-ended questions about chest radiographs.
We generate 217k interactive and high-quality summaries from free-text radiology reports.
arXiv Detail & Related papers (2023-06-13T17:59:59Z) - Instrumental Variable Learning for Chest X-ray Classification [52.68170685918908]
We propose an interpretable instrumental variable (IV) learning framework to eliminate the spurious association and obtain accurate causal representation.
Our approach's performance is demonstrated using the MIMIC-CXR, NIH ChestX-ray 14, and CheXpert datasets.
arXiv Detail & Related papers (2023-05-20T03:12:23Z) - Image Embedding and Model Ensembling for Automated Chest X-Ray
Interpretation [0.0]
We present and study several machine learning approaches to develop automated Chest X-ray diagnostic models.
In particular, we trained several Convolutional Neural Networks (CNN) on the CheXpert dataset.
We used the trained CNNs to compute embeddings of the CXR images, in order to train two sets of tree-based classifiers from them.
arXiv Detail & Related papers (2021-05-05T14:48:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.