Effect of large-scale pre-training on full and few-shot transfer
learning for natural and medical images
- URL: http://arxiv.org/abs/2106.00116v1
- Date: Mon, 31 May 2021 21:55:56 GMT
- Title: Effect of large-scale pre-training on full and few-shot transfer
learning for natural and medical images
- Authors: Mehdi Cherti and Jenia Jitsev
- Abstract summary: We conduct large-scale pre-training on large source datasets of either natural (ImageNet-21k/1k) or medical chest X-Ray images.
We compare full and few-shot transfer using different target datasets from both natural and medical imaging domains.
Our observations provide evidence that while pre-training and transfer on closely related datasets do show clear benefit of increasing model and data size during pre-training, such benefits are not clearly visible when source and target datasets are further apart.
- Score: 2.030567625639093
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transfer learning aims to exploit pre-trained models for more efficient
follow-up training on wide range of downstream tasks and datasets, enabling
successful training also on small data. Recent line of work posits strong
benefits for model generalization and transfer when model size, data size, and
compute budget are increased for the pre-training. It remains however still
largely unclear whether the observed transfer improvement due to increase in
scale also holds when source and target data distributions are far apart from
each other. In this work we conduct large-scale pre-training on large source
datasets of either natural (ImageNet-21k/1k) or medical chest X-Ray images and
compare full and few-shot transfer using different target datasets from both
natural and medical imaging domains. Our observations provide evidence that
while pre-training and transfer on closely related datasets do show clear
benefit of increasing model and data size during pre-training, such benefits
are not clearly visible when source and target datasets are further apart.
These observations hold across both full and few-shot transfer and indicate
that scaling laws hinting improvement of generalization and transfer with
increasing model and data size are incomplete and should also take into account
the degree of how distinct the source and target data distributions are, to
correctly predict effect of model size and data size variation during
pre-training on transfer. (Repository for reproducing the experiments will be
made available.)
Related papers
- Enhancing pretraining efficiency for medical image segmentation via transferability metrics [0.0]
In medical image segmentation tasks, the scarcity of labeled training data poses a significant challenge.
We introduce a novel transferability metric, based on contrastive learning, that measures how robustly a pretrained model is able to represent the target data.
arXiv Detail & Related papers (2024-10-24T12:11:52Z) - Rethinking Transformers Pre-training for Multi-Spectral Satellite
Imagery [78.43828998065071]
Recent advances in unsupervised learning have demonstrated the ability of large vision models to achieve promising results on downstream tasks.
Such pre-training techniques have also been explored recently in the remote sensing domain due to the availability of large amount of unlabelled data.
In this paper, we re-visit transformers pre-training and leverage multi-scale information that is effectively utilized with multiple modalities.
arXiv Detail & Related papers (2024-03-08T16:18:04Z) - MoCo-Transfer: Investigating out-of-distribution contrastive learning
for limited-data domains [52.612507614610244]
We analyze the benefit of transferring self-supervised contrastive representations from moment contrast (MoCo) pretraining to settings with limited data.
We find that depending on quantity of labeled and unlabeled data, contrastive pretraining on larger out-of-distribution datasets can perform nearly as well or better than MoCo pretraining in-domain.
arXiv Detail & Related papers (2023-11-15T21:56:47Z) - Transfer learning from a sparsely annotated dataset of 3D medical images [4.477071833136902]
This study explores the use of transfer learning to improve the performance of deep convolutional neural networks for organ segmentation in medical imaging.
A base segmentation model was trained on a large and sparsely annotated dataset; its weights were used for transfer learning on four new down-stream segmentation tasks.
The results showed that transfer learning from the base model was beneficial when small datasets were available.
arXiv Detail & Related papers (2023-11-08T21:31:02Z) - On the Connection between Pre-training Data Diversity and Fine-tuning
Robustness [66.30369048726145]
We find that the primary factor influencing downstream effective robustness is data quantity.
We demonstrate our findings on pre-training distributions drawn from various natural and synthetic data sources.
arXiv Detail & Related papers (2023-07-24T05:36:19Z) - The Role of Pre-training Data in Transfer Learning [20.768366728182997]
We investigate the impact of pre-training data distribution on the few-shot and full fine-tuning performance.
We find that the choice of the pre-training data source is essential for the few-shot transfer, but its role decreases as more data is made available for fine-tuning.
arXiv Detail & Related papers (2023-02-27T09:10:08Z) - How Well Do Sparse Imagenet Models Transfer? [75.98123173154605]
Transfer learning is a classic paradigm by which models pretrained on large "upstream" datasets are adapted to yield good results on "downstream" datasets.
In this work, we perform an in-depth investigation of this phenomenon in the context of convolutional neural networks (CNNs) trained on the ImageNet dataset.
We show that sparse models can match or even outperform the transfer performance of dense models, even at high sparsities.
arXiv Detail & Related papers (2021-11-26T11:58:51Z) - Self-Supervised Pretraining Improves Self-Supervised Pretraining [83.1423204498361]
Self-supervised pretraining requires expensive and lengthy computation, large amounts of data, and is sensitive to data augmentation.
This paper explores Hierarchical PreTraining (HPT), which decreases convergence time and improves accuracy by initializing the pretraining process with an existing pretrained model.
We show HPT converges up to 80x faster, improves accuracy across tasks, and improves the robustness of the self-supervised pretraining process to changes in the image augmentation policy or amount of pretraining data.
arXiv Detail & Related papers (2021-03-23T17:37:51Z) - Scaling Laws for Transfer [0.5432984841650929]
We study scaling laws for transfer learning between distributions in an unsupervised, fine-tuning setting.
We find that the effective data transferred is described well in the low data regime by a power-law of parameter count and fine-tuning dataset size.
arXiv Detail & Related papers (2021-02-02T04:07:38Z) - Adversarially-Trained Deep Nets Transfer Better: Illustration on Image
Classification [53.735029033681435]
Transfer learning is a powerful methodology for adapting pre-trained deep neural networks on image recognition tasks to new domains.
In this work, we demonstrate that adversarially-trained models transfer better than non-adversarially-trained models.
arXiv Detail & Related papers (2020-07-11T22:48:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.