Channel Scaling: A Scale-and-Select Approach for Transfer Learning
- URL: http://arxiv.org/abs/2103.12228v1
- Date: Mon, 22 Mar 2021 23:26:57 GMT
- Title: Channel Scaling: A Scale-and-Select Approach for Transfer Learning
- Authors: Ken C. L. Wong, Satyananda Kashyap, Mehdi Moradi
- Abstract summary: Transfer learning with pre-trained neural networks is a common strategy for training classifiers in medical image analysis.
We propose a novel approach to efficiently build small and well performing networks by introducing the channel-scaling layers.
By imposing L1 regularization and thresholding on the scaling weights, this framework iteratively removes unnecessary feature channels from a pre-trained model.
- Score: 2.6304695993930594
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transfer learning with pre-trained neural networks is a common strategy for
training classifiers in medical image analysis. Without proper channel
selections, this often results in unnecessarily large models that hinder
deployment and explainability. In this paper, we propose a novel approach to
efficiently build small and well performing networks by introducing the
channel-scaling layers. A channel-scaling layer is attached to each frozen
convolutional layer, with the trainable scaling weights inferring the
importance of the corresponding feature channels. Unlike the fine-tuning
approaches, we maintain the weights of the original channels and large datasets
are not required. By imposing L1 regularization and thresholding on the scaling
weights, this framework iteratively removes unnecessary feature channels from a
pre-trained model. Using an ImageNet pre-trained VGG16 model, we demonstrate
the capabilities of the proposed framework on classifying opacity from chest
X-ray images. The results show that we can reduce the number of parameters by
95% while delivering a superior performance.
Related papers
- Enhancing pretraining efficiency for medical image segmentation via transferability metrics [0.0]
In medical image segmentation tasks, the scarcity of labeled training data poses a significant challenge.
We introduce a novel transferability metric, based on contrastive learning, that measures how robustly a pretrained model is able to represent the target data.
arXiv Detail & Related papers (2024-10-24T12:11:52Z) - Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think [72.48325960659822]
One main bottleneck in training large-scale diffusion models for generation lies in effectively learning these representations.
We study this by introducing a straightforward regularization called REPresentation Alignment (REPA), which aligns the projections of noisy input hidden states in denoising networks with clean image representations obtained from external, pretrained visual encoders.
The results are striking: our simple strategy yields significant improvements in both training efficiency and generation quality when applied to popular diffusion and flow-based transformers, such as DiTs and SiTs.
arXiv Detail & Related papers (2024-10-09T14:34:53Z) - Scale-Equivariant UNet for Histopathology Image Segmentation [1.213915839836187]
Convolutional Neural Networks (CNNs) trained on such images at a given scale fail to generalise to those at different scales.
We propose the Scale-Equivariant UNet (SEUNet) for image segmentation by building on scale-space theory.
arXiv Detail & Related papers (2023-04-10T14:03:08Z) - Slimmable Networks for Contrastive Self-supervised Learning [69.9454691873866]
Self-supervised learning makes significant progress in pre-training large models, but struggles with small models.
We introduce another one-stage solution to obtain pre-trained small models without the need for extra teachers.
A slimmable network consists of a full network and several weight-sharing sub-networks, which can be pre-trained once to obtain various networks.
arXiv Detail & Related papers (2022-09-30T15:15:05Z) - Scale Attention for Learning Deep Face Representation: A Study Against
Visual Scale Variation [69.45176408639483]
We reform the conv layer by resorting to the scale-space theory.
We build a novel style named SCale AttentioN Conv Neural Network (textbfSCAN-CNN)
As a single-shot scheme, the inference is more efficient than multi-shot fusion.
arXiv Detail & Related papers (2022-09-19T06:35:04Z) - CHEX: CHannel EXploration for CNN Model Compression [47.3520447163165]
We propose a novel Channel Exploration methodology, dubbed as CHEX, to rectify these problems.
CheX repeatedly prunes and regrows the channels throughout the training process, which reduces the risk of pruning important channels prematurely.
Results demonstrate that CHEX can effectively reduce the FLOPs of diverse CNN architectures on a variety of computer vision tasks.
arXiv Detail & Related papers (2022-03-29T17:52:41Z) - Scale-invariant scale-channel networks: Deep networks that generalise to
previously unseen scales [0.0]
We show that two previously proposed scale channel network designs do not generalise well to scales not present in the training set.
We propose a new type of foveated scale channel architecture, where the scale channels process increasingly larger parts of the image with decreasing resolution.
Our proposed FovMax and FovAvg networks perform almost identically over a scale range of 8, also when training on single scale training data, and do also give improved performance when learning from transformations with large scale variations in the small sample regime.
arXiv Detail & Related papers (2021-06-11T14:22:26Z) - Direct Quantization for Training Highly Accurate Low Bit-width Deep
Neural Networks [73.29587731448345]
This paper proposes two novel techniques to train deep convolutional neural networks with low bit-width weights and activations.
First, to obtain low bit-width weights, most existing methods obtain the quantized weights by performing quantization on the full-precision network weights.
Second, to obtain low bit-width activations, existing works consider all channels equally.
arXiv Detail & Related papers (2020-12-26T15:21:18Z) - Mixed-Privacy Forgetting in Deep Networks [114.3840147070712]
We show that the influence of a subset of the training samples can be removed from the weights of a network trained on large-scale image classification tasks.
Inspired by real-world applications of forgetting techniques, we introduce a novel notion of forgetting in mixed-privacy setting.
We show that our method allows forgetting without having to trade off the model accuracy.
arXiv Detail & Related papers (2020-12-24T19:34:56Z) - Neural networks with late-phase weights [66.72777753269658]
We show that the solutions found by SGD can be further improved by ensembling a subset of the weights in late stages of learning.
At the end of learning, we obtain back a single model by taking a spatial average in weight space.
arXiv Detail & Related papers (2020-07-25T13:23:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.