Partial transfusion: on the expressive influence of trainable batch norm
parameters for transfer learning
- URL: http://arxiv.org/abs/2102.05543v1
- Date: Wed, 10 Feb 2021 16:29:03 GMT
- Title: Partial transfusion: on the expressive influence of trainable batch norm
parameters for transfer learning
- Authors: Fahdi Kanavati, Masayuki Tsuneki
- Abstract summary: Transfer learning from ImageNet is the go-to approach when applying deep learning to medical images.
Most modern architecture contain batch normalisation layers, and fine-tuning a model with such layers requires taking a few precautions.
We find that only fine-tuning the trainable weights of the batch normalisation layers leads to similar performance as to fine-tuning all of the weights.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transfer learning from ImageNet is the go-to approach when applying deep
learning to medical images. The approach is either to fine-tune a pre-trained
model or use it as a feature extractor. Most modern architecture contain batch
normalisation layers, and fine-tuning a model with such layers requires taking
a few precautions as they consist of trainable and non-trainable weights and
have two operating modes: training and inference. Attention is primarily given
to the non-trainable weights used during inference, as they are the primary
source of unexpected behaviour or degradation in performance during transfer
learning. It is typically recommended to fine-tune the model with the batch
normalisation layers kept in inference mode during both training and inference.
In this paper, we pay closer attention instead to the trainable weights of the
batch normalisation layers, and we explore their expressive influence in the
context of transfer learning. We find that only fine-tuning the trainable
weights (scale and centre) of the batch normalisation layers leads to similar
performance as to fine-tuning all of the weights, with the added benefit of
faster convergence. We demonstrate this on a variety of seven publicly
available medical imaging datasets, using four different model architectures.
Related papers
- TrAct: Making First-layer Pre-Activations Trainable [65.40281259525578]
We consider the training of the first layer of vision models and notice the clear relationship between pixel values and update magnitudes.
An image with low contrast has a smaller impact on learning than an image with higher contrast.
A very bright or very dark image has a stronger impact on the weights than an image with moderate brightness.
arXiv Detail & Related papers (2024-10-31T14:25:55Z) - Efficient Training with Denoised Neural Weights [65.14892033932895]
This work takes a novel step towards building a weight generator to synthesize the neural weights for initialization.
We use the image-to-image translation task with generative adversarial networks (GANs) as an example due to the ease of collecting model weights.
By initializing the image translation model with the denoised weights predicted by our diffusion model, the training requires only 43.3 seconds.
arXiv Detail & Related papers (2024-07-16T17:59:42Z) - Intra-task Mutual Attention based Vision Transformer for Few-Shot Learning [12.5354658533836]
Humans possess remarkable ability to accurately classify new, unseen images after being exposed to only a few examples.
For artificial neural network models, determining the most relevant features for distinguishing between two images with limited samples presents a challenge.
We propose an intra-task mutual attention method for few-shot learning, that involves splitting the support and query samples into patches.
arXiv Detail & Related papers (2024-05-06T02:02:57Z) - An Emulator for Fine-Tuning Large Language Models using Small Language
Models [91.02498576056057]
We introduce emulated fine-tuning (EFT), a principled and practical method for sampling from a distribution that approximates the result of pre-training and fine-tuning at different scales.
We show that EFT enables test-time adjustment of competing behavioral traits like helpfulness and harmlessness without additional training.
Finally, a special case of emulated fine-tuning, which we call LM up-scaling, avoids resource-intensive fine-tuning of large pre-trained models by ensembling them with small fine-tuned models.
arXiv Detail & Related papers (2023-10-19T17:57:16Z) - What Happens During Finetuning of Vision Transformers: An Invariance
Based Investigation [7.432224771219168]
The pretrain-finetune paradigm usually improves downstream performance over training a model from scratch on the same task.
In this work, we examine the relationship between pretrained vision transformers and the corresponding finetuned versions on several benchmark datasets and tasks.
arXiv Detail & Related papers (2023-07-12T08:35:24Z) - Pre-text Representation Transfer for Deep Learning with Limited
Imbalanced Data : Application to CT-based COVID-19 Detection [18.72489078928417]
We propose a novel concept of Pre-text Representation Transfer (PRT)
PRT retains the original classification layers and updates the representation layers through an unsupervised pre-text task.
Our results show a consistent gain over the conventional transfer learning with the proposed method.
arXiv Detail & Related papers (2023-01-21T04:47:35Z) - Surgical Fine-Tuning Improves Adaptation to Distribution Shifts [114.17184775397067]
A common approach to transfer learning under distribution shift is to fine-tune the last few layers of a pre-trained model.
This paper shows that in such settings, selectively fine-tuning a subset of layers matches or outperforms commonly used fine-tuning approaches.
arXiv Detail & Related papers (2022-10-20T17:59:15Z) - PatchNR: Learning from Small Data by Patch Normalizing Flow
Regularization [57.37911115888587]
We introduce a regularizer for the variational modeling of inverse problems in imaging based on normalizing flows.
Our regularizer, called patchNR, involves a normalizing flow learned on patches of very few images.
arXiv Detail & Related papers (2022-05-24T12:14:26Z) - Non-binary deep transfer learning for imageclassification [1.858151490268935]
Current standard for computer vision tasks is to fine-tune weights pre-trained on a large image classification dataset such as ImageNet.
The application of transfer learning and transfer learning methods tends to be rigidly binary.
We present methods for non-binary transfer learning including combining L2SP and L2 regularization.
arXiv Detail & Related papers (2021-07-19T02:34:38Z) - Channel Scaling: A Scale-and-Select Approach for Transfer Learning [2.6304695993930594]
Transfer learning with pre-trained neural networks is a common strategy for training classifiers in medical image analysis.
We propose a novel approach to efficiently build small and well performing networks by introducing the channel-scaling layers.
By imposing L1 regularization and thresholding on the scaling weights, this framework iteratively removes unnecessary feature channels from a pre-trained model.
arXiv Detail & Related papers (2021-03-22T23:26:57Z) - Partial Is Better Than All: Revisiting Fine-tuning Strategy for Few-shot
Learning [76.98364915566292]
A common practice is to train a model on the base set first and then transfer to novel classes through fine-tuning.
We propose to transfer partial knowledge by freezing or fine-tuning particular layer(s) in the base model.
We conduct extensive experiments on CUB and mini-ImageNet to demonstrate the effectiveness of our proposed method.
arXiv Detail & Related papers (2021-02-08T03:27:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.