Related papers: Revisiting MAE pre-training for 3D medical image segmentation

Revisiting MAE pre-training for 3D medical image segmentation

URL: http://arxiv.org/abs/2410.23132v2
Date: Mon, 02 Dec 2024 12:05:29 GMT
Title: Revisiting MAE pre-training for 3D medical image segmentation
Authors: Tassilo Wald, Constantin Ulrich, Stanislav Lukyanenko, Andrei Goncharov, Alberto Paderno, Leander Maerkisch, Paul F. Jäger, Klaus Maier-Hein,
Abstract summary: Self-Supervised Learning (SSL) presents an exciting opportunity to unlock the potential of vast, untapped clinical datasets.<n> SSL has revolutionized fields like natural language processing and computer vision, its adoption in 3D medical image computing has been limited by three key pitfalls.<n>In this paper, we address these issues by i) leveraging a large-scale dataset of 39k 3D brain MRI volumes and ii) using a Residual U-Net architecture within the state-of-the-art nnU-Net framework.
Score: 0.08484806297945031
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Self-Supervised Learning (SSL) presents an exciting opportunity to unlock the potential of vast, untapped clinical datasets, for various downstream applications that suffer from the scarcity of labeled data. While SSL has revolutionized fields like natural language processing and computer vision, its adoption in 3D medical image computing has been limited by three key pitfalls: Small pre-training dataset sizes, architectures inadequate for 3D medical image analysis, and insufficient evaluation practices. In this paper, we address these issues by i) leveraging a large-scale dataset of 39k 3D brain MRI volumes and ii) using a Residual Encoder U-Net architecture within the state-of-the-art nnU-Net framework. iii) A robust development framework, incorporating 5 development and 8 testing brain MRI segmentation datasets, allowed performance-driven design decisions to optimize the simple concept of Masked Auto Encoders (MAEs) for 3D CNNs. The resulting model not only surpasses previous SSL methods but also outperforms the strong nnU-Net baseline by an average of approximately 3 Dice points setting a new state-of-the-art. Our code and models are made available here.

Related papers

An OpenMind for 3D medical vision self-supervised learning [1.1223322894276315]
We publish the largest publicly available pre-training dataset comprising 114k 3D brain MRI volumes. We benchmark existing 3D self-supervised learning methods on this dataset for a state-of-the-art CNN and Transformer architecture.
arXiv Detail & Related papers (2024-12-22T14:38:28Z)
Generative Enhancement for 3D Medical Images [74.17066529847546]
We propose GEM-3D, a novel generative approach to the synthesis of 3D medical images. Our method begins with a 2D slice, noted as the informed slice to serve the patient prior, and propagates the generation process using a 3D segmentation mask. By decomposing the 3D medical images into masks and patient prior information, GEM-3D offers a flexible yet effective solution for generating versatile 3D images.
arXiv Detail & Related papers (2024-03-19T15:57:04Z)
FILP-3D: Enhancing 3D Few-shot Class-incremental Learning with Pre-trained Vision-Language Models [62.663113296987085]
Few-shot class-incremental learning aims to mitigate the catastrophic forgetting issue when a model is incrementally trained on limited data. We introduce two novel components: the Redundant Feature Eliminator (RFE) and the Spatial Noise Compensator (SNC) Considering the imbalance in existing 3D datasets, we also propose new evaluation metrics that offer a more nuanced assessment of a 3D FSCIL model.
arXiv Detail & Related papers (2023-12-28T14:52:07Z)
Disruptive Autoencoders: Leveraging Low-level features for 3D Medical Image Pre-training [51.16994853817024]
This work focuses on designing an effective pre-training framework for 3D radiology images. We introduce Disruptive Autoencoders, a pre-training framework that attempts to reconstruct the original image from disruptions created by a combination of local masking and low-level perturbations. The proposed pre-training framework is tested across multiple downstream tasks and achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-07-31T17:59:42Z)
Self-supervised learning via inter-modal reconstruction and feature projection networks for label-efficient 3D-to-2D segmentation [4.5206601127476445]
We propose a novel convolutional neural network (CNN) and self-supervised learning (SSL) method for label-efficient 3D-to-2D segmentation. Results on different datasets demonstrate that the proposed CNN significantly improves the state of the art in scenarios with limited labeled data by up to 8% in Dice score.
arXiv Detail & Related papers (2023-07-06T14:16:25Z)
Video Pretraining Advances 3D Deep Learning on Chest CT Tasks [63.879848037679224]
Pretraining on large natural image classification datasets has aided model development on data-scarce 2D medical tasks. These 2D models have been surpassed by 3D models on 3D computer vision benchmarks. We show video pretraining for 3D models can enable higher performance on smaller datasets for 3D medical tasks.
arXiv Detail & Related papers (2023-04-02T14:46:58Z)
ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding [110.07170245531464]
Current 3D models are limited by datasets with a small number of annotated data and a pre-defined set of categories. Recent advances have shown that similar problems can be significantly alleviated by employing knowledge from other modalities, such as language. We learn a unified representation of images, texts, and 3D point clouds by pre-training with object triplets from the three modalities.
arXiv Detail & Related papers (2022-12-10T01:34:47Z)
Advancing 3D Medical Image Analysis with Variable Dimension Transform based Supervised 3D Pre-training [45.90045513731704]
This paper revisits an innovative yet simple fully-supervised 3D network pre-training framework. With a redesigned 3D network architecture, reformulated natural images are used to address the problem of data scarcity. Comprehensive experiments on four benchmark datasets demonstrate that the proposed pre-trained models can effectively accelerate convergence.
arXiv Detail & Related papers (2022-01-05T03:11:21Z)
Improved Brain Age Estimation with Slice-based Set Networks [18.272915375351914]
We propose a new architecture for BrainAGE prediction. The proposed architecture works by encoding each 2D slice in an MRI with a deep 2D-CNN model. Next, it combines the information from these 2D-slice encodings using set networks or permutation invariant layers. Experiments on the BrainAGE prediction problem, using the UK Biobank dataset, showed that the model with the permutation invariant layers trains faster and provides better predictions compared to other state-of-the-art approaches.
arXiv Detail & Related papers (2021-02-08T18:54:15Z)
Automated Model Design and Benchmarking of 3D Deep Learning Models for COVID-19 Detection with Chest CT Scans [72.04652116817238]
We propose a differentiable neural architecture search (DNAS) framework to automatically search for the 3D DL models for 3D chest CT scans classification. We also exploit the Class Activation Mapping (CAM) technique on our models to provide the interpretability of the results.
arXiv Detail & Related papers (2021-01-14T03:45:01Z)
Hyperspectral Classification Based on Lightweight 3-D-CNN With Transfer Learning [67.40866334083941]
We propose an end-to-end 3-D lightweight convolutional neural network (CNN) for limited samples-based HSI classification. Compared with conventional 3-D-CNN models, the proposed 3-D-LWNet has a deeper network structure, less parameters, and lower computation cost. Our model achieves competitive performance for HSI classification compared to several state-of-the-art methods.
arXiv Detail & Related papers (2020-12-07T03:44:35Z)
Planar 3D Transfer Learning for End to End Unimodal MRI Unbalanced Data Segmentation [0.0]
We present a novel approach of 2D to 3D transfer learning based on mapping pre-trained 2D convolutional neural network weights into planar 3D kernels. The method is validated by the proposed planar 3D res-u-net network with encoder transferred from the 2D VGG-16.
arXiv Detail & Related papers (2020-11-23T17:11:50Z)
Efficient embedding network for 3D brain tumor segmentation [0.33727511459109777]
In this paper, we investigate a way to transfer the performance of a two-dimensional classiffication network for the purpose of three-dimensional semantic segmentation of brain tumors. As the input data is in 3D, the first layers of the encoder are devoted to the reduction of the third dimension in order to fit the input of the EfficientNet network. Experimental results on validation and test data from the BraTS 2020 challenge demonstrate that the proposed method achieve promising performance.
arXiv Detail & Related papers (2020-11-22T16:17:29Z)
Volumetric Medical Image Segmentation: A 3D Deep Coarse-to-fine Framework and Its Adversarial Examples [74.92488215859991]
We propose a novel 3D-based coarse-to-fine framework to efficiently tackle these challenges. The proposed 3D-based framework outperforms their 2D counterparts by a large margin since it can leverage the rich spatial information along all three axes. We conduct experiments on three datasets, the NIH pancreas dataset, the JHMI pancreas dataset and the JHMI pathological cyst dataset.
arXiv Detail & Related papers (2020-10-29T15:39:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.