The Missing Piece: A Case for Pre-Training in 3D Medical Object Detection
- URL: http://arxiv.org/abs/2509.15947v1
- Date: Fri, 19 Sep 2025 12:55:51 GMT
- Title: The Missing Piece: A Case for Pre-Training in 3D Medical Object Detection
- Authors: Katharina Eckstein, Constantin Ulrich, Michael Baumgartner, Jessica Kächele, Dimitrios Bounias, Tassilo Wald, Ralf Floca, Klaus H. Maier-Hein,
- Abstract summary: Large-scale pre-training holds the promise to advance 3D medical object detection.<n>Existing pre-training approaches for 3D object detection rely on 2D medical data or natural image pre-training.<n>We present the first systematic study of how existing pre-training methods can be integrated into state-of-the-art detection architectures.
- Score: 7.065674915708414
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Large-scale pre-training holds the promise to advance 3D medical object detection, a crucial component of accurate computer-aided diagnosis. Yet, it remains underexplored compared to segmentation, where pre-training has already demonstrated significant benefits. Existing pre-training approaches for 3D object detection rely on 2D medical data or natural image pre-training, failing to fully leverage 3D volumetric information. In this work, we present the first systematic study of how existing pre-training methods can be integrated into state-of-the-art detection architectures, covering both CNNs and Transformers. Our results show that pre-training consistently improves detection performance across various tasks and datasets. Notably, reconstruction-based self-supervised pre-training outperforms supervised pre-training, while contrastive pre-training provides no clear benefit for 3D medical object detection. Our code is publicly available at: https://github.com/MIC-DKFZ/nnDetection-finetuning.
Related papers
- Medical Semantic Segmentation with Diffusion Pretrain [1.9415817267757087]
Recent advances in deep learning have shown that learning robust feature representations is critical for the success of many computer vision tasks.<n>We propose a novel pretraining strategy using diffusion models with anatomical guidance, tailored to the intricacies of 3D medical image data.<n>We employ an additional model that predicts 3D universal body-part coordinates, providing guidance during the diffusion process.
arXiv Detail & Related papers (2025-01-31T16:25:49Z) - An OpenMind for 3D medical vision self-supervised learning [1.1223322894276315]
We publish the largest publicly available pre-training dataset comprising 114k 3D brain MRI volumes.<n>We benchmark existing 3D self-supervised learning methods on this dataset for a state-of-the-art CNN and Transformer architecture.
arXiv Detail & Related papers (2024-12-22T14:38:28Z) - Enhancing Weakly Supervised 3D Medical Image Segmentation through Probabilistic-aware Learning [47.700298779672366]
3D medical image segmentation is a challenging task with crucial implications for disease diagnosis and treatment planning.<n>Recent advances in deep learning have significantly enhanced fully supervised medical image segmentation.<n>We propose a novel probabilistic-aware weakly supervised learning pipeline, specifically designed for 3D medical imaging.
arXiv Detail & Related papers (2024-03-05T00:46:53Z) - Primitive Geometry Segment Pre-training for 3D Medical Image
Segmentation [12.251689154843342]
We present the Primitive Geometry Segment Pre-training (PrimGeoSeg) method to enable the learning of 3D semantic features.
PrimGeoSeg performs more accurate and efficient 3D medical image segmentation without manual data collection and annotation.
arXiv Detail & Related papers (2024-01-08T04:37:35Z) - 3D Adversarial Augmentations for Robust Out-of-Domain Predictions [115.74319739738571]
We focus on improving the generalization to out-of-domain data.
We learn a set of vectors that deform the objects in an adversarial fashion.
We perform adversarial augmentation by applying the learned sample-independent vectors to the available objects when training a model.
arXiv Detail & Related papers (2023-08-29T17:58:55Z) - Video Pretraining Advances 3D Deep Learning on Chest CT Tasks [63.879848037679224]
Pretraining on large natural image classification datasets has aided model development on data-scarce 2D medical tasks.
These 2D models have been surpassed by 3D models on 3D computer vision benchmarks.
We show video pretraining for 3D models can enable higher performance on smaller datasets for 3D medical tasks.
arXiv Detail & Related papers (2023-04-02T14:46:58Z) - Rethinking Self-Supervised Visual Representation Learning in
Pre-training for 3D Human Pose and Shape Estimation [57.206129938611454]
Self-supervised representation learning (SSL) methods have outperformed the ImageNet classification pre-training for vision tasks such as object detection.
We empirically study and analyze the effects of SSL and compare it with other pre-training alternatives for 3DHPSE.
Our observations challenge the naive application of the current SSL pre-training to 3DHPSE and relight the value of other data types in the pre-training aspect.
arXiv Detail & Related papers (2023-03-09T16:17:52Z) - Advancing 3D Medical Image Analysis with Variable Dimension Transform
based Supervised 3D Pre-training [45.90045513731704]
This paper revisits an innovative yet simple fully-supervised 3D network pre-training framework.
With a redesigned 3D network architecture, reformulated natural images are used to address the problem of data scarcity.
Comprehensive experiments on four benchmark datasets demonstrate that the proposed pre-trained models can effectively accelerate convergence.
arXiv Detail & Related papers (2022-01-05T03:11:21Z) - PointContrast: Unsupervised Pre-training for 3D Point Cloud
Understanding [107.02479689909164]
In this work, we aim at facilitating research on 3D representation learning.
We measure the effect of unsupervised pre-training on a large source set of 3D scenes.
arXiv Detail & Related papers (2020-07-21T17:59:22Z) - SESS: Self-Ensembling Semi-Supervised 3D Object Detection [138.80825169240302]
We propose SESS, a self-ensembling semi-supervised 3D object detection framework. Specifically, we design a thorough perturbation scheme to enhance generalization of the network on unlabeled and new unseen data.
Our SESS achieves competitive performance compared to the state-of-the-art fully-supervised method by using only 50% labeled data.
arXiv Detail & Related papers (2019-12-26T08:48:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.