Building Universal Foundation Models for Medical Image Analysis with
Spatially Adaptive Networks
- URL: http://arxiv.org/abs/2312.07630v2
- Date: Wed, 24 Jan 2024 04:04:26 GMT
- Title: Building Universal Foundation Models for Medical Image Analysis with
Spatially Adaptive Networks
- Authors: Lingxiao Luo, Xuanzhong Chen, Bingda Tang, Xinsheng Chen, Rong Han,
Chengpeng Hu, Yujiang Li, Ting Chen
- Abstract summary: We propose a universal foundation model for medical image analysis that processes images with heterogeneous spatial properties using a unified structure.
We pre-train a spatial adaptive visual tokenizer (SPAD-VT) and then a spatial adaptive Vision Transformer (SPAD-ViT) via masked image modeling (MIM) on 55 public medical image datasets.
The experimental results on downstream medical image classification and segmentation tasks demonstrate the superior performance and label efficiency of our model.
- Score: 5.661631789478932
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advancements in foundation models, typically trained with
self-supervised learning on large-scale and diverse datasets, have shown great
potential in medical image analysis. However, due to the significant spatial
heterogeneity of medical imaging data, current models must tailor specific
structures for different datasets, making it challenging to leverage the
abundant unlabeled data. In this work, we propose a universal foundation model
for medical image analysis that processes images with heterogeneous spatial
properties using a unified structure. To accomplish this, we propose spatially
adaptive networks (SPAD-Nets), a family of networks that dynamically adjust the
structures to adapt to the spatial properties of input images, to build such a
universal foundation model. We pre-train a spatial adaptive visual tokenizer
(SPAD-VT) and then a spatial adaptive Vision Transformer (SPAD-ViT) via masked
image modeling (MIM) on 55 public medical image datasets. The pre-training data
comprises over 9 million image slices, representing the largest, most
comprehensive, and most diverse dataset to our knowledge for pre-training
universal foundation models for medical image analysis. The experimental
results on downstream medical image classification and segmentation tasks
demonstrate the superior performance and label efficiency of our model. Our
code is available at https://github.com/function2-llx/PUMIT.
Related papers
- Universal Medical Imaging Model for Domain Generalization with Data Privacy [2.8727695958743364]
We propose a federated learning approach to transfer knowledge from multiple local models to a global model.
The primary objective is to train a global model capable of performing a wide variety of medical imaging tasks.
arXiv Detail & Related papers (2024-07-20T01:24:15Z) - Boosting Medical Image Segmentation Performance with Adaptive Convolution Layer [6.887244952811574]
We propose an adaptive layer placed ahead of leading deep-learning models such as UCTransNet.
Our approach enhances the network's ability to handle diverse anatomical structures and subtle image details.
It consistently outperforms traditional CNNs with fixed kernel sizes with a similar number of parameters.
arXiv Detail & Related papers (2024-04-17T13:18:39Z) - Generative Medical Segmentation [5.4613210257624605]
Generative Medical (GMS) is a novel approach leveraging a generative model to perform image segmentation.
GMS employs a robust pre-trained vision foundation model to extract latent representations for images and corresponding ground truth masks.
The design of GMS leads to fewer trainable parameters in the model which reduces the risk of overfitting and enhances its capability.
arXiv Detail & Related papers (2024-03-27T02:16:04Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - Learnable Weight Initialization for Volumetric Medical Image Segmentation [66.3030435676252]
We propose a learnable weight-based hybrid medical image segmentation approach.
Our approach is easy to integrate into any hybrid model and requires no external training data.
Experiments on multi-organ and lung cancer segmentation tasks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-06-15T17:55:05Z) - AMIGO: Sparse Multi-Modal Graph Transformer with Shared-Context
Processing for Representation Learning of Giga-pixel Images [53.29794593104923]
We present a novel concept of shared-context processing for whole slide histopathology images.
AMIGO uses the celluar graph within the tissue to provide a single representation for a patient.
We show that our model is strongly robust to missing information to an extent that it can achieve the same performance with as low as 20% of the data.
arXiv Detail & Related papers (2023-03-01T23:37:45Z) - Enhancing MR Image Segmentation with Realistic Adversarial Data
Augmentation [17.539828821476224]
We propose an adversarial data augmentation approach to improve the efficiency in utilizing training data.
We present a generic task-driven learning framework, which jointly optimize a data augmentation model and a segmentation network during training.
The proposed adversarial data augmentation does not rely on generative networks and can be used as a plug-in module in general segmentation networks.
arXiv Detail & Related papers (2021-08-07T11:32:37Z) - Medical Transformer: Gated Axial-Attention for Medical Image
Segmentation [73.98974074534497]
We study the feasibility of using Transformer-based network architectures for medical image segmentation tasks.
We propose a Gated Axial-Attention model which extends the existing architectures by introducing an additional control mechanism in the self-attention module.
To train the model effectively on medical images, we propose a Local-Global training strategy (LoGo) which further improves the performance.
arXiv Detail & Related papers (2021-02-21T18:35:14Z) - DoFE: Domain-oriented Feature Embedding for Generalizable Fundus Image
Segmentation on Unseen Datasets [96.92018649136217]
We present a novel Domain-oriented Feature Embedding (DoFE) framework to improve the generalization ability of CNNs on unseen target domains.
Our DoFE framework dynamically enriches the image features with additional domain prior knowledge learned from multi-source domains.
Our framework generates satisfying segmentation results on unseen datasets and surpasses other domain generalization and network regularization methods.
arXiv Detail & Related papers (2020-10-13T07:28:39Z) - Realistic Adversarial Data Augmentation for MR Image Segmentation [17.951034264146138]
We propose an adversarial data augmentation method for training neural networks for medical image segmentation.
Our model generates plausible and realistic signal corruptions, which models the intensity inhomogeneities caused by a common type of artefacts in MR imaging: bias field.
We show that such an approach can improve the ability generalization and robustness of models as well as provide significant improvements in low-data scenarios.
arXiv Detail & Related papers (2020-06-23T20:43:18Z) - Pathological Retinal Region Segmentation From OCT Images Using Geometric
Relation Based Augmentation [84.7571086566595]
We propose improvements over previous GAN-based medical image synthesis methods by jointly encoding the intrinsic relationship of geometry and shape.
The proposed method outperforms state-of-the-art segmentation methods on the public RETOUCH dataset having images captured from different acquisition procedures.
arXiv Detail & Related papers (2020-03-31T11:50:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.