Unified 2D and 3D Pre-training for Medical Image classification and
Segmentation
- URL: http://arxiv.org/abs/2112.09356v1
- Date: Fri, 17 Dec 2021 07:27:23 GMT
- Title: Unified 2D and 3D Pre-training for Medical Image classification and
Segmentation
- Authors: Yutong Xie, Jianpeng Zhang, Yong Xia, Qi Wu
- Abstract summary: We propose a Universal Self-Supervised Transformer (USST) framework based on the student-teacher paradigm.
USST aims to leverage a huge of unlabeled medical data with multiple dimensions to learn rich representations.
It provides promising results on six 2D/3D medical image classification and segmentation tasks.
- Score: 40.01443481859121
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Self-supervised learning (SSL) opens up huge opportunities for better
utilizing unlabeled data. It is essential for medical image analysis that is
generally known for its lack of annotations. However, when we attempt to use as
many as possible unlabeled medical images in SSL, breaking the dimension
barrier (\ie, making it possible to jointly use both 2D and 3D images) becomes
a must. In this paper, we propose a Universal Self-Supervised Transformer
(USST) framework based on the student-teacher paradigm, aiming to leverage a
huge of unlabeled medical data with multiple dimensions to learn rich
representations. To achieve this, we design a Pyramid Transformer U-Net (PTU)
as the backbone, which is composed of switchable patch embedding (SPE) layers
and Transformer layers. The SPE layer switches to either 2D or 3D patch
embedding depending on the input dimension. After that, the images are
converted to a sequence regardless of their original dimensions. The
Transformer layer then models the long-term dependencies in a
sequence-to-sequence manner, thus enabling USST to learn representations from
both 2D and 3D images. USST has two obvious merits compared to current
dimension-specific SSL: (1) \textbf{more effective} - can learn representations
from more and diverse data; and (2) \textbf{more versatile} - can be
transferred to various downstream tasks. The results show that USST provides
promising results on six 2D/3D medical image classification and segmentation
tasks, outperforming the supervised ImageNet pre-training and advanced SSL
counterparts substantially.
Related papers
- Cross-Dimensional Medical Self-Supervised Representation Learning Based on a Pseudo-3D Transformation [68.60747298865394]
We propose a new cross-dimensional SSL framework based on a pseudo-3D transformation (CDSSL-P3D)
Specifically, we introduce an image transformation based on the im2col algorithm, which converts 2D images into a format consistent with 3D data.
This transformation enables seamless integration of 2D and 3D data, and facilitates cross-dimensional self-supervised learning for 3D medical image analysis.
arXiv Detail & Related papers (2024-06-03T02:57:25Z) - Med3DInsight: Enhancing 3D Medical Image Understanding with 2D
Multi-Modal Large Language Models [1.64647940449869]
Existing 3D convolution and transformer-based methods have limited semantic understanding of an image volume.
We propose Med3DInsight, which marries existing 3D image encoders with 2D MLLMs and bridges them via a Plane-Slice-Aware Transformer (PSAT) module.
arXiv Detail & Related papers (2024-03-08T08:15:53Z) - Promise:Prompt-driven 3D Medical Image Segmentation Using Pretrained
Image Foundation Models [13.08275555017179]
We propose ProMISe, a prompt-driven 3D medical image segmentation model using only a single point prompt.
We evaluate our model on two public datasets for colon and pancreas tumor segmentations.
arXiv Detail & Related papers (2023-10-30T16:49:03Z) - Multi-View Vertebra Localization and Identification from CT Images [57.56509107412658]
We propose a multi-view vertebra localization and identification from CT images.
We convert the 3D problem into a 2D localization and identification task on different views.
Our method can learn the multi-view global information naturally.
arXiv Detail & Related papers (2023-07-24T14:43:07Z) - M$^{2}$SNet: Multi-scale in Multi-scale Subtraction Network for Medical
Image Segmentation [73.10707675345253]
We propose a general multi-scale in multi-scale subtraction network (M$2$SNet) to finish diverse segmentation from medical image.
Our method performs favorably against most state-of-the-art methods under different evaluation metrics on eleven datasets of four different medical image segmentation tasks.
arXiv Detail & Related papers (2023-03-20T06:26:49Z) - PCRLv2: A Unified Visual Information Preservation Framework for
Self-supervised Pre-training in Medical Image Analysis [56.63327669853693]
We propose to incorporate the task of pixel restoration for explicitly encoding more pixel-level information into high-level semantics.
We also address the preservation of scale information, a powerful tool in aiding image understanding.
The proposed unified SSL framework surpasses its self-supervised counterparts on various tasks.
arXiv Detail & Related papers (2023-01-02T17:47:27Z) - Joint Self-Supervised Image-Volume Representation Learning with
Intra-Inter Contrastive Clustering [31.52291149830299]
Self-supervised learning can overcome the lack of labeled training samples by learning feature representations from unlabeled data.
Most current SSL techniques in the medical field have been designed for either 2D images or 3D volumes.
We propose a novel framework for unsupervised joint learning on 2D and 3D data modalities.
arXiv Detail & Related papers (2022-12-04T18:57:44Z) - Two-Stream Graph Convolutional Network for Intra-oral Scanner Image
Segmentation [133.02190910009384]
We propose a two-stream graph convolutional network (i.e., TSGCN) to handle inter-view confusion between different raw attributes.
Our TSGCN significantly outperforms state-of-the-art methods in 3D tooth (surface) segmentation.
arXiv Detail & Related papers (2022-04-19T10:41:09Z) - Uni4Eye: Unified 2D and 3D Self-supervised Pre-training via Masked Image
Modeling Transformer for Ophthalmic Image Classification [1.2250035750661867]
We propose a universal self-supervised Transformer framework, named Uni4Eye, to capture domain-specific feature embedding in ophthalmic images.
Uni4Eye can serve as a global feature extractor, which builds its basis on a Masked Image Modeling task with a Vision Transformer architecture.
We employ a Unified Patch Embedding module to replace the origin patch embedding module in ViT for jointly processing both 2D and 3D input images.
arXiv Detail & Related papers (2022-03-09T10:02:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.