M^3Builder: A Multi-Agent System for Automated Machine Learning in Medical Imaging
- URL: http://arxiv.org/abs/2502.20301v1
- Date: Thu, 27 Feb 2025 17:29:46 GMT
- Title: M^3Builder: A Multi-Agent System for Automated Machine Learning in Medical Imaging
- Authors: Jinghao Feng, Qiaoyu Zheng, Chaoyi Wu, Ziheng Zhao, Ya Zhang, Yanfeng Wang, Weidi Xie,
- Abstract summary: We present M3Builder, a novel multi-agent system designed to automate machine learning (ML) in medical imaging.<n>At its core, M3Builder employs four specialized agents that collaborate to tackle complex, multi-step medical ML.<n>Compared to existing ML agentic designs, M3Builder shows superior performance on completing ML tasks in medical imaging.
- Score: 54.40890979694209
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Agentic AI systems have gained significant attention for their ability to autonomously perform complex tasks. However, their reliance on well-prepared tools limits their applicability in the medical domain, which requires to train specialized models. In this paper, we make three contributions: (i) We present M3Builder, a novel multi-agent system designed to automate machine learning (ML) in medical imaging. At its core, M3Builder employs four specialized agents that collaborate to tackle complex, multi-step medical ML workflows, from automated data processing and environment configuration to self-contained auto debugging and model training. These agents operate within a medical imaging ML workspace, a structured environment designed to provide agents with free-text descriptions of datasets, training codes, and interaction tools, enabling seamless communication and task execution. (ii) To evaluate progress in automated medical imaging ML, we propose M3Bench, a benchmark comprising four general tasks on 14 training datasets, across five anatomies and three imaging modalities, covering both 2D and 3D data. (iii) We experiment with seven state-of-the-art large language models serving as agent cores for our system, such as Claude series, GPT-4o, and DeepSeek-V3. Compared to existing ML agentic designs, M3Builder shows superior performance on completing ML tasks in medical imaging, achieving a 94.29% success rate using Claude-3.7-Sonnet as the agent core, showing huge potential towards fully automated machine learning in medical imaging.
Related papers
- OmniV-Med: Scaling Medical Vision-Language Model for Universal Visual Understanding [35.35197484810533]
We present OmniV-Med, a unified framework for multimodal medical understanding.
We devise a rotary position-adaptive encoder that processes multi-resolution 2D/3D images and videos within a unified architecture.
We introduce a medical-aware token pruning mechanism that exploits spatial-temporal redundancy in volumetric data and medical videos.
arXiv Detail & Related papers (2025-04-20T17:53:56Z) - Med-2E3: A 2D-Enhanced 3D Medical Multimodal Large Language Model [16.93216342922561]
We propose Med-2E3, a novel MLLM for 3D medical image analysis that integrates 3D and 2D encoders.
To aggregate 2D features more effectively, we design a Text-Guided Inter-Slice (TG-IS) scoring module, which scores the attention of each 2D slice based on slice contents and task instructions.
Experiments on a large-scale, open-source 3D medical multimodal benchmark demonstrate that Med-2E3 exhibits task-specific attention distribution and significantly outperforms current state-of-the-art models.
arXiv Detail & Related papers (2024-11-19T09:59:59Z) - Automated Ensemble Multimodal Machine Learning for Healthcare [52.500923923797835]
We introduce a multimodal framework, AutoPrognosis-M, that enables the integration of structured clinical (tabular) data and medical imaging using automated machine learning.
AutoPrognosis-M incorporates 17 imaging models, including convolutional neural networks and vision transformers, and three distinct multimodal fusion strategies.
arXiv Detail & Related papers (2024-07-25T17:46:38Z) - Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? [73.81908518992161]
We introduce Spider2-V, the first multimodal agent benchmark focusing on professional data science and engineering.
Spider2-V features real-world tasks in authentic computer environments and incorporating 20 enterprise-level professional applications.
These tasks evaluate the ability of a multimodal agent to perform data-related tasks by writing code and managing the GUI in enterprise data software systems.
arXiv Detail & Related papers (2024-07-15T17:54:37Z) - MMedAgent: Learning to Use Medical Tools with Multi-modal Agent [27.314055140281432]
This paper introduces the first agent explicitly designed for the medical field, named textbfMulti-modal textbfMedical textbfAgent (MMedAgent)
Comprehensive experiments demonstrate that MMedAgent achieves superior performance across a variety of medical tasks compared to state-of-the-art open-source methods and even the closed-source model, GPT-4o.
arXiv Detail & Related papers (2024-07-02T17:58:23Z) - M3H: Multimodal Multitask Machine Learning for Healthcare [7.4489490661717355]
M3H is an explainable Multimodal Multitask Machine Learning for Healthcare framework.
It consolidates learning from data for supervised binary/multiclass classification, regression, and unsupervised clustering.
It consistently outperforms single-task models by on average 11.6% across 40 disease diagnoses from 16 medical departments, three hospital operation forecasts, and one patient phenotyping task.
arXiv Detail & Related papers (2024-04-29T14:39:15Z) - Med-MoE: Mixture of Domain-Specific Experts for Lightweight Medical Vision-Language Models [17.643421997037514]
We propose a novel framework that tackles both discriminative and generative multimodal medical tasks.
The learning of Med-MoE consists of three steps: multimodal medical alignment, instruction tuning and routing, and domain-specific MoE tuning.
Our model can achieve performance superior to or on par with state-of-the-art baselines.
arXiv Detail & Related papers (2024-04-16T02:35:17Z) - M3D: Advancing 3D Medical Image Analysis with Multi-Modal Large Language Models [49.5030774873328]
Previous research has primarily focused on 2D medical images, leaving 3D images under-explored, despite their richer spatial information.
We present a large-scale 3D multi-modal medical dataset, M3D-Data, comprising 120K image-text pairs and 662K instruction-response pairs.
We also introduce a new 3D multi-modal medical benchmark, M3D-Bench, which facilitates automatic evaluation across eight tasks.
arXiv Detail & Related papers (2024-03-31T06:55:12Z) - An Interactive Agent Foundation Model [49.77861810045509]
We propose an Interactive Agent Foundation Model that uses a novel multi-task agent training paradigm for training AI agents.
Our training paradigm unifies diverse pre-training strategies, including visual masked auto-encoders, language modeling, and next-action prediction.
We demonstrate the performance of our framework across three separate domains -- Robotics, Gaming AI, and Healthcare.
arXiv Detail & Related papers (2024-02-08T18:58:02Z) - Towards Medical Artificial General Intelligence via Knowledge-Enhanced
Multimodal Pretraining [121.89793208683625]
Medical artificial general intelligence (MAGI) enables one foundation model to solve different medical tasks.
We propose a new paradigm called Medical-knedge-enhanced mulTimOdal pretRaining (MOTOR)
arXiv Detail & Related papers (2023-04-26T01:26:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.