STU-Net: Scalable and Transferable Medical Image Segmentation Models
Empowered by Large-Scale Supervised Pre-training
- URL: http://arxiv.org/abs/2304.06716v1
- Date: Thu, 13 Apr 2023 17:59:13 GMT
- Title: STU-Net: Scalable and Transferable Medical Image Segmentation Models
Empowered by Large-Scale Supervised Pre-training
- Authors: Ziyan Huang, Haoyu Wang, Zhongying Deng, Jin Ye, Yanzhou Su, Hui Sun,
Junjun He, Yun Gu, Lixu Gu, Shaoting Zhang and Yu Qiao
- Abstract summary: We design a series of scalable U-Net (STU-Net) models, with parameter sizes ranging from 14 million to 1.4 billion.
We train our scalable STU-Net models on a large-scale TotalSegmentator dataset and find that increasing model size brings a stronger performance gain.
We observe good performance of our pre-trained model in both direct inference and fine-tuning.
- Score: 43.04882328763337
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large-scale models pre-trained on large-scale datasets have profoundly
advanced the development of deep learning. However, the state-of-the-art models
for medical image segmentation are still small-scale, with their parameters
only in the tens of millions. Further scaling them up to higher orders of
magnitude is rarely explored. An overarching goal of exploring large-scale
models is to train them on large-scale medical segmentation datasets for better
transfer capacities. In this work, we design a series of Scalable and
Transferable U-Net (STU-Net) models, with parameter sizes ranging from 14
million to 1.4 billion. Notably, the 1.4B STU-Net is the largest medical image
segmentation model to date. Our STU-Net is based on nnU-Net framework due to
its popularity and impressive performance. We first refine the default
convolutional blocks in nnU-Net to make them scalable. Then, we empirically
evaluate different scaling combinations of network depth and width, discovering
that it is optimal to scale model depth and width together. We train our
scalable STU-Net models on a large-scale TotalSegmentator dataset and find that
increasing model size brings a stronger performance gain. This observation
reveals that a large model is promising in medical image segmentation.
Furthermore, we evaluate the transferability of our model on 14 downstream
datasets for direct inference and 3 datasets for further fine-tuning, covering
various modalities and segmentation targets. We observe good performance of our
pre-trained model in both direct inference and fine-tuning. The code and
pre-trained models are available at https://github.com/Ziyan-Huang/STU-Net.
Related papers
- A Collaborative Ensemble Framework for CTR Prediction [73.59868761656317]
We propose a novel framework, Collaborative Ensemble Training Network (CETNet), to leverage multiple distinct models.
Unlike naive model scaling, our approach emphasizes diversity and collaboration through collaborative learning.
We validate our framework on three public datasets and a large-scale industrial dataset from Meta.
arXiv Detail & Related papers (2024-11-20T20:38:56Z) - Scaling Laws for Task-Optimized Models of the Primate Visual Ventral Stream [3.4526439922541705]
We evaluate scaling laws for modeling the primate visual ventral stream (VVS)
We observe that while behavioral alignment continues to scale with larger models, neural alignment saturates.
Increased scaling is especially beneficial for higher-level visual areas, where small models trained on few samples exhibit only poor alignment.
arXiv Detail & Related papers (2024-11-08T17:13:53Z) - More Compute Is What You Need [3.184416958830696]
We propose a new scaling law that suggests model performance depends mostly on the amount of compute spent for transformer-based models.
We predict that (a) for inference efficiency, training should prioritize smaller model sizes and larger training datasets, and (b) assuming the exhaustion of available web datasets, scaling the model size might be the only way to further improve model performance.
arXiv Detail & Related papers (2024-04-30T12:05:48Z) - LiteNeXt: A Novel Lightweight ConvMixer-based Model with Self-embedding Representation Parallel for Medical Image Segmentation [2.0901574458380403]
We propose a new lightweight but efficient model, namely LiteNeXt, for medical image segmentation.
LiteNeXt is trained from scratch with small amount of parameters (0.71M) and Giga Floating Point Operations Per Second (0.42).
arXiv Detail & Related papers (2024-04-04T01:59:19Z) - When Do We Not Need Larger Vision Models? [55.957626371697785]
Scaling up the size of vision models has been the de facto standard to obtain more powerful visual representations.
We demonstrate the power of Scaling on Scales (S$2$), whereby a pre-trained and frozen smaller vision model can outperform larger models.
We release a Python package that can apply S$2$ on any vision model with one line of code.
arXiv Detail & Related papers (2024-03-19T17:58:39Z) - Delving Deeper into Data Scaling in Masked Image Modeling [145.36501330782357]
We conduct an empirical study on the scaling capability of masked image modeling (MIM) methods for visual recognition.
Specifically, we utilize the web-collected Coyo-700M dataset.
Our goal is to investigate how the performance changes on downstream tasks when scaling with different sizes of data and models.
arXiv Detail & Related papers (2023-05-24T15:33:46Z) - Part-Based Models Improve Adversarial Robustness [57.699029966800644]
We show that combining human prior knowledge with end-to-end learning can improve the robustness of deep neural networks.
Our model combines a part segmentation model with a tiny classifier and is trained end-to-end to simultaneously segment objects into parts.
Our experiments indicate that these models also reduce texture bias and yield better robustness against common corruptions and spurious correlations.
arXiv Detail & Related papers (2022-09-15T15:41:47Z) - Advancing Plain Vision Transformer Towards Remote Sensing Foundation
Model [97.9548609175831]
We resort to plain vision transformers with about 100 million parameters and make the first attempt to propose large vision models customized for remote sensing tasks.
Specifically, to handle the large image size and objects of various orientations in RS images, we propose a new rotated varied-size window attention.
Experiments on detection tasks demonstrate the superiority of our model over all state-of-the-art models, achieving 81.16% mAP on the DOTA-V1.0 dataset.
arXiv Detail & Related papers (2022-08-08T09:08:40Z) - ScaleNet: Searching for the Model to Scale [44.05380012545087]
We propose ScaleNet to jointly search base model and scaling strategy.
We show our scaled networks enjoy significant performance superiority on various FLOPs.
arXiv Detail & Related papers (2022-07-15T03:16:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.