Related papers: Unified 2D and 3D Pre-training for Medical Image classification and Segmentation

Unified 2D and 3D Pre-training for Medical Image classification and Segmentation

URL: http://arxiv.org/abs/2112.09356v1
Date: Fri, 17 Dec 2021 07:27:23 GMT
Title: Unified 2D and 3D Pre-training for Medical Image classification and Segmentation
Authors: Yutong Xie, Jianpeng Zhang, Yong Xia, Qi Wu
Abstract summary: We propose a Universal Self-Supervised Transformer (USST) framework based on the student-teacher paradigm. USST aims to leverage a huge of unlabeled medical data with multiple dimensions to learn rich representations. It provides promising results on six 2D/3D medical image classification and segmentation tasks.
Score: 40.01443481859121
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Self-supervised learning (SSL) opens up huge opportunities for better utilizing unlabeled data. It is essential for medical image analysis that is generally known for its lack of annotations. However, when we attempt to use as many as possible unlabeled medical images in SSL, breaking the dimension barrier (\ie, making it possible to jointly use both 2D and 3D images) becomes a must. In this paper, we propose a Universal Self-Supervised Transformer (USST) framework based on the student-teacher paradigm, aiming to leverage a huge of unlabeled medical data with multiple dimensions to learn rich representations. To achieve this, we design a Pyramid Transformer U-Net (PTU) as the backbone, which is composed of switchable patch embedding (SPE) layers and Transformer layers. The SPE layer switches to either 2D or 3D patch embedding depending on the input dimension. After that, the images are converted to a sequence regardless of their original dimensions. The Transformer layer then models the long-term dependencies in a sequence-to-sequence manner, thus enabling USST to learn representations from both 2D and 3D images. USST has two obvious merits compared to current dimension-specific SSL: (1) \textbf{more effective} - can learn representations from more and diverse data; and (2) \textbf{more versatile} - can be transferred to various downstream tasks. The results show that USST provides promising results on six 2D/3D medical image classification and segmentation tasks, outperforming the supervised ImageNet pre-training and advanced SSL counterparts substantially.

Related papers

Cross-Dimensional Medical Self-Supervised Representation Learning Based on a Pseudo-3D Transformation [68.60747298865394]
We propose a new cross-dimensional SSL framework based on a pseudo-3D transformation (CDSSL-P3D) Specifically, we introduce an image transformation based on the im2col algorithm, which converts 2D images into a format consistent with 3D data. This transformation enables seamless integration of 2D and 3D data, and facilitates cross-dimensional self-supervised learning for 3D medical image analysis.
arXiv Detail & Related papers (2024-06-03T02:57:25Z)
Med3DInsight: Enhancing 3D Medical Image Understanding with 2D Multi-Modal Large Language Models [1.64647940449869]
Existing 3D convolution and transformer-based methods have limited semantic understanding of an image volume. We propose Med3DInsight, which marries existing 3D image encoders with 2D MLLMs and bridges them via a Plane-Slice-Aware Transformer (PSAT) module.
arXiv Detail & Related papers (2024-03-08T08:15:53Z)
Promise:Prompt-driven 3D Medical Image Segmentation Using Pretrained Image Foundation Models [13.08275555017179]
We propose ProMISe, a prompt-driven 3D medical image segmentation model using only a single point prompt. We evaluate our model on two public datasets for colon and pancreas tumor segmentations.
arXiv Detail & Related papers (2023-10-30T16:49:03Z)
Multi-View Vertebra Localization and Identification from CT Images [57.56509107412658]
We propose a multi-view vertebra localization and identification from CT images. We convert the 3D problem into a 2D localization and identification task on different views. Our method can learn the multi-view global information naturally.
arXiv Detail & Related papers (2023-07-24T14:43:07Z)
M$^{2}$SNet: Multi-scale in Multi-scale Subtraction Network for Medical Image Segmentation [73.10707675345253]
We propose a general multi-scale in multi-scale subtraction network (M$2$SNet) to finish diverse segmentation from medical image. Our method performs favorably against most state-of-the-art methods under different evaluation metrics on eleven datasets of four different medical image segmentation tasks.
arXiv Detail & Related papers (2023-03-20T06:26:49Z)
PCRLv2: A Unified Visual Information Preservation Framework for Self-supervised Pre-training in Medical Image Analysis [56.63327669853693]
We propose to incorporate the task of pixel restoration for explicitly encoding more pixel-level information into high-level semantics. We also address the preservation of scale information, a powerful tool in aiding image understanding. The proposed unified SSL framework surpasses its self-supervised counterparts on various tasks.
arXiv Detail & Related papers (2023-01-02T17:47:27Z)
Joint Self-Supervised Image-Volume Representation Learning with Intra-Inter Contrastive Clustering [31.52291149830299]
Self-supervised learning can overcome the lack of labeled training samples by learning feature representations from unlabeled data. Most current SSL techniques in the medical field have been designed for either 2D images or 3D volumes. We propose a novel framework for unsupervised joint learning on 2D and 3D data modalities.
arXiv Detail & Related papers (2022-12-04T18:57:44Z)
Two-Stream Graph Convolutional Network for Intra-oral Scanner Image Segmentation [133.02190910009384]
We propose a two-stream graph convolutional network (i.e., TSGCN) to handle inter-view confusion between different raw attributes. Our TSGCN significantly outperforms state-of-the-art methods in 3D tooth (surface) segmentation.
arXiv Detail & Related papers (2022-04-19T10:41:09Z)
Uni4Eye: Unified 2D and 3D Self-supervised Pre-training via Masked Image Modeling Transformer for Ophthalmic Image Classification [1.2250035750661867]
We propose a universal self-supervised Transformer framework, named Uni4Eye, to capture domain-specific feature embedding in ophthalmic images. Uni4Eye can serve as a global feature extractor, which builds its basis on a Masked Image Modeling task with a Vision Transformer architecture. We employ a Unified Patch Embedding module to replace the origin patch embedding module in ViT for jointly processing both 2D and 3D input images.
arXiv Detail & Related papers (2022-03-09T10:02:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.