Stone Needle: A General Multimodal Large-scale Model Framework towards
Healthcare
- URL: http://arxiv.org/abs/2306.16034v1
- Date: Wed, 28 Jun 2023 09:04:56 GMT
- Title: Stone Needle: A General Multimodal Large-scale Model Framework towards
Healthcare
- Authors: Weihua Liu and Yong Zuo
- Abstract summary: Stone Needle is a general multimodal large-scale model framework tailored explicitly for healthcare applications.
Our architecture can perform multi-modal interaction in multiple rounds of dialogue.
The fusion of different modalities and the ability to process complex medical information in Stone Needle benefits accurate diagnosis, treatment recommendations, and patient care.
- Score: 1.7894377200944511
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In healthcare, multimodal data is prevalent and requires to be
comprehensively analyzed before diagnostic decisions, including medical images,
clinical reports, etc. However, current large-scale artificial intelligence
models predominantly focus on single-modal cognitive abilities and neglect the
integration of multiple modalities. Therefore, we propose Stone Needle, a
general multimodal large-scale model framework tailored explicitly for
healthcare applications. Stone Needle serves as a comprehensive medical
multimodal model foundation, integrating various modalities such as text,
images, videos, and audio to surpass the limitations of single-modal systems.
Through the framework components of intent analysis, medical foundation models,
prompt manager, and medical language module, our architecture can perform
multi-modal interaction in multiple rounds of dialogue. Our method is a general
multimodal large-scale model framework, integrating diverse modalities and
allowing us to tailor for specific tasks. The experimental results demonstrate
the superior performance of our method compared to single-modal systems. The
fusion of different modalities and the ability to process complex medical
information in Stone Needle benefits accurate diagnosis, treatment
recommendations, and patient care.
Related papers
- MedViLaM: A multimodal large language model with advanced generalizability and explainability for medical data understanding and generation [40.9095393430871]
We introduce MedViLaM, a unified vision-language model towards a generalist model for medical data.
MedViLaM can flexibly encode and interpret various forms of medical data, including clinical language and imaging.
We present instances of zero-shot generalization to new medical concepts and tasks, effective transfer learning across different tasks, and the emergence of zero-shot medical reasoning.
arXiv Detail & Related papers (2024-09-29T12:23:10Z) - FEDKIM: Adaptive Federated Knowledge Injection into Medical Foundation Models [54.09244105445476]
This study introduces a novel knowledge injection approach, FedKIM, to scale the medical foundation model within a federated learning framework.
FedKIM leverages lightweight local models to extract healthcare knowledge from private data and integrates this knowledge into a centralized foundation model.
Our experiments across twelve tasks in seven modalities demonstrate the effectiveness of FedKIM in various settings.
arXiv Detail & Related papers (2024-08-17T15:42:29Z) - Automated Ensemble Multimodal Machine Learning for Healthcare [52.500923923797835]
We introduce a multimodal framework, AutoPrognosis-M, that enables the integration of structured clinical (tabular) data and medical imaging using automated machine learning.
AutoPrognosis-M incorporates 17 imaging models, including convolutional neural networks and vision transformers, and three distinct multimodal fusion strategies.
arXiv Detail & Related papers (2024-07-25T17:46:38Z) - Medical Vision-Language Pre-Training for Brain Abnormalities [96.1408455065347]
We show how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed.
In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset.
We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain.
arXiv Detail & Related papers (2024-04-27T05:03:42Z) - Med-MoE: Mixture of Domain-Specific Experts for Lightweight Medical Vision-Language Models [17.643421997037514]
We propose a novel framework that tackles both discriminative and generative multimodal medical tasks.
The learning of Med-MoE consists of three steps: multimodal medical alignment, instruction tuning and routing, and domain-specific MoE tuning.
Our model can achieve performance superior to or on par with state-of-the-art baselines.
arXiv Detail & Related papers (2024-04-16T02:35:17Z) - Modality-Aware and Shift Mixer for Multi-modal Brain Tumor Segmentation [12.094890186803958]
We present a novel Modality Aware and Shift Mixer that integrates intra-modality and inter-modality dependencies of multi-modal images for effective and robust brain tumor segmentation.
Specifically, we introduce a Modality-Aware module according to neuroimaging studies for modeling the specific modality pair relationships at low levels, and a Modality-Shift module with specific mosaic patterns is developed to explore the complex relationships across modalities at high levels via the self-attention.
arXiv Detail & Related papers (2024-03-04T14:21:51Z) - Towards Medical Artificial General Intelligence via Knowledge-Enhanced
Multimodal Pretraining [121.89793208683625]
Medical artificial general intelligence (MAGI) enables one foundation model to solve different medical tasks.
We propose a new paradigm called Medical-knedge-enhanced mulTimOdal pretRaining (MOTOR)
arXiv Detail & Related papers (2023-04-26T01:26:19Z) - Specialty-Oriented Generalist Medical AI for Chest CT Screening [14.31187762890342]
We propose the first-of-its-kind medical multimodal-multitask foundation model (M3FM) with application in lung cancer screening and related tasks.
M3FM consistently outperforms the state-of-the-art single-modal task-specific models.
As a specialty-oriented generalist medical AI model, M3FM paves the way for similar breakthroughs in other areas of medicine.
arXiv Detail & Related papers (2023-04-03T20:19:56Z) - Cross-Modal Information Maximization for Medical Imaging: CMIM [62.28852442561818]
In hospitals, data are siloed to specific information systems that make the same information available under different modalities.
This offers unique opportunities to obtain and use at train-time those multiple views of the same information that might not always be available at test-time.
We propose an innovative framework that makes the most of available data by learning good representations of a multi-modal input that are resilient to modality dropping at test-time.
arXiv Detail & Related papers (2020-10-20T20:05:35Z) - Robust Multimodal Brain Tumor Segmentation via Feature Disentanglement
and Gated Fusion [71.87627318863612]
We propose a novel multimodal segmentation framework which is robust to the absence of imaging modalities.
Our network uses feature disentanglement to decompose the input modalities into the modality-specific appearance code.
We validate our method on the important yet challenging multimodal brain tumor segmentation task with the BRATS challenge dataset.
arXiv Detail & Related papers (2020-02-22T14:32:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.