Related papers: Towards Medical Artificial General Intelligence via Knowledge-Enhanced Multimodal Pretraining

Towards Medical Artificial General Intelligence via Knowledge-Enhanced Multimodal Pretraining

URL: http://arxiv.org/abs/2304.14204v1
Date: Wed, 26 Apr 2023 01:26:19 GMT
Title: Towards Medical Artificial General Intelligence via Knowledge-Enhanced Multimodal Pretraining
Authors: Bingqian Lin, Zicong Chen, Mingjie Li, Haokun Lin, Hang Xu, Yi Zhu, Jianzhuang Liu, Wenjia Cai, Lei Yang, Shen Zhao, Chenfei Wu, Ling Chen, Xiaojun Chang, Yi Yang, Lei Xing, Xiaodan Liang
Abstract summary: Medical artificial general intelligence (MAGI) enables one foundation model to solve different medical tasks. We propose a new paradigm called Medical-knedge-enhanced mulTimOdal pretRaining (MOTOR)
Score: 121.89793208683625
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Medical artificial general intelligence (MAGI) enables one foundation model to solve different medical tasks, which is very practical in the medical domain. It can significantly reduce the requirement of large amounts of task-specific data by sufficiently sharing medical knowledge among different tasks. However, due to the challenges of designing strongly generalizable models with limited and complex medical data, most existing approaches tend to develop task-specific models. To take a step towards MAGI, we propose a new paradigm called Medical-knOwledge-enhanced mulTimOdal pretRaining (MOTOR). In MOTOR, we combine two kinds of basic medical knowledge, i.e., general and specific knowledge, in a complementary manner to boost the general pretraining process. As a result, the foundation model with comprehensive basic knowledge can learn compact representations from pretraining radiographic data for better cross-modal alignment. MOTOR unifies the understanding and generation, which are two kinds of core intelligence of an AI system, into a single medical foundation model, to flexibly handle more diverse medical tasks. To enable a comprehensive evaluation and facilitate further research, we construct a medical multimodal benchmark including a wide range of downstream tasks, such as chest x-ray report generation and medical visual question answering. Extensive experiments on our benchmark show that MOTOR obtains promising results through simple task-oriented adaptation. The visualization shows that the injected knowledge successfully highlights key information in the medical data, demonstrating the excellent interpretability of MOTOR. Our MOTOR successfully mimics the human practice of fulfilling a "medical student" to accelerate the process of becoming a "specialist". We believe that our work makes a significant stride in realizing MAGI.

Related papers

MedResearcher-R1: Expert-Level Medical Deep Researcher via A Knowledge-Informed Trajectory Synthesis Framework [24.399778346443757]
General-purpose deep research agents struggle with medical domain challenges, as evidenced by leading proprietary systems.<n>We present a medical deep research agent that addresses these challenges through two core innovations.<n>Our approach generates 2100+ diverse trajectories across 12 medical specialties, each averaging 4.2 tool interactions.
arXiv Detail & Related papers (2025-08-20T17:51:20Z)
MedGemma Technical Report [75.88152277443179]
We introduce MedGemma, a collection of medical vision-language foundation models based on Gemma 3 4B and 27B.<n>MedGemma demonstrates advanced medical understanding and reasoning on images and text.<n>We additionally introduce MedSigLIP, a medically-tuned vision encoder derived from SigLIP.
arXiv Detail & Related papers (2025-07-07T17:01:44Z)
MAM: Modular Multi-Agent Framework for Multi-Modal Medical Diagnosis via Role-Specialized Collaboration [57.98393950821579]
We introduce the Modular Multi-Agent Framework for Multi-Modal Medical Diagnosis (MAM)<n>Inspired by our empirical findings, MAM decomposes the medical diagnostic process into specialized roles: a General Practitioner, Specialist Team, Radiologist, Medical Assistant, and Director.<n>This modular and collaborative framework enables efficient knowledge updates and leverages existing medical LLMs and knowledge bases.
arXiv Detail & Related papers (2025-06-24T17:52:43Z)
Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning [57.873833577058]
We build a multimodal dataset enriched with extensive medical knowledge.<n>We then introduce our medical-specialized MLLM: Lingshu.<n>Lingshu undergoes multi-stage training to embed medical expertise and enhance its task-solving capabilities.
arXiv Detail & Related papers (2025-06-08T08:47:30Z)
MIND: Modality-Informed Knowledge Distillation Framework for Multimodal Clinical Prediction Tasks [50.98856172702256]
We propose the Modality-INformed knowledge Distillation (MIND) framework, a multimodal model compression approach. MIND transfers knowledge from ensembles of pre-trained deep neural networks of varying sizes into a smaller multimodal student. We evaluate MIND on binary and multilabel clinical prediction tasks using time series data and chest X-ray images.
arXiv Detail & Related papers (2025-02-03T08:50:00Z)
LLM-MedQA: Enhancing Medical Question Answering through Case Studies in Large Language Models [18.6994780408699]
Large Language Models (LLMs) face significant challenges in medical question answering. We propose a novel approach incorporating similar case generation within a multi-agent medical question-answering system. Our method capitalizes on the model's inherent medical knowledge and reasoning capabilities, eliminating the need for additional training data.
arXiv Detail & Related papers (2024-12-31T19:55:45Z)
MedViLaM: A multimodal large language model with advanced generalizability and explainability for medical data understanding and generation [40.9095393430871]
We introduce MedViLaM, a unified vision-language model towards a generalist model for medical data. MedViLaM can flexibly encode and interpret various forms of medical data, including clinical language and imaging. We present instances of zero-shot generalization to new medical concepts and tasks, effective transfer learning across different tasks, and the emergence of zero-shot medical reasoning.
arXiv Detail & Related papers (2024-09-29T12:23:10Z)
FEDKIM: Adaptive Federated Knowledge Injection into Medical Foundation Models [54.09244105445476]
This study introduces a novel knowledge injection approach, FedKIM, to scale the medical foundation model within a federated learning framework. FedKIM leverages lightweight local models to extract healthcare knowledge from private data and integrates this knowledge into a centralized foundation model. Our experiments across twelve tasks in seven modalities demonstrate the effectiveness of FedKIM in various settings.
arXiv Detail & Related papers (2024-08-17T15:42:29Z)
Med-MoE: Mixture of Domain-Specific Experts for Lightweight Medical Vision-Language Models [17.643421997037514]
We propose a novel framework that tackles both discriminative and generative multimodal medical tasks. The learning of Med-MoE consists of three steps: multimodal medical alignment, instruction tuning and routing, and domain-specific MoE tuning. Our model can achieve performance superior to or on par with state-of-the-art baselines.
arXiv Detail & Related papers (2024-04-16T02:35:17Z)
Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation [113.5002649181103]
Training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology. For training, we assemble a large dataset of over 697 thousand radiology image-text pairs. For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation. The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications.
arXiv Detail & Related papers (2024-03-12T18:12:02Z)
fMRI-PTE: A Large-scale fMRI Pretrained Transformer Encoder for Multi-Subject Brain Activity Decoding [54.17776744076334]
We propose fMRI-PTE, an innovative auto-encoder approach for fMRI pre-training. Our approach involves transforming fMRI signals into unified 2D representations, ensuring consistency in dimensions and preserving brain activity patterns. Our contributions encompass introducing fMRI-PTE, innovative data transformation, efficient training, a novel learning strategy, and the universal applicability of our approach.
arXiv Detail & Related papers (2023-11-01T07:24:22Z)
Towards Generalist Biomedical AI [28.68106423175678]
We introduce Med-PaLM Multimodal (Med-PaLM M), our proof of concept for a generalist biomedical AI system. Med-PaLM M is a large multimodal generative model that flexibly encodes and interprets biomedical data. We conduct a radiologist evaluation of model-generated (and human) chest X-ray reports and observe encouraging performance across model scales.
arXiv Detail & Related papers (2023-07-26T17:52:22Z)
Stone Needle: A General Multimodal Large-scale Model Framework towards Healthcare [1.7894377200944511]
Stone Needle is a general multimodal large-scale model framework tailored explicitly for healthcare applications. Our architecture can perform multi-modal interaction in multiple rounds of dialogue. The fusion of different modalities and the ability to process complex medical information in Stone Needle benefits accurate diagnosis, treatment recommendations, and patient care.
arXiv Detail & Related papers (2023-06-28T09:04:56Z)
Customizing General-Purpose Foundation Models for Medical Report Generation [64.31265734687182]
The scarcity of labelled medical image-report pairs presents great challenges in the development of deep and large-scale neural networks. We propose customizing off-the-shelf general-purpose large-scale pre-trained models, i.e., foundation models (FMs) in computer vision and natural language processing.
arXiv Detail & Related papers (2023-06-09T03:02:36Z)
Specialty-Oriented Generalist Medical AI for Chest CT Screening [14.31187762890342]
We propose the first-of-its-kind medical multimodal-multitask foundation model (M3FM) with application in lung cancer screening and related tasks. M3FM consistently outperforms the state-of-the-art single-modal task-specific models. As a specialty-oriented generalist medical AI model, M3FM paves the way for similar breakthroughs in other areas of medicine.
arXiv Detail & Related papers (2023-04-03T20:19:56Z)
Competence-based Multimodal Curriculum Learning for Medical Report Generation [98.10763792453925]
We propose a Competence-based Multimodal Curriculum Learning framework ( CMCL) to alleviate the data bias and make best use of available data. Specifically, CMCL simulates the learning process of radiologists and optimize the model in a step by step manner. Experiments on the public IU-Xray and MIMIC-CXR datasets show that CMCL can be incorporated into existing models to improve their performance.
arXiv Detail & Related papers (2022-06-24T08:16:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.