Using UNet and PSPNet to explore the reusability principle of CNN
parameters
- URL: http://arxiv.org/abs/2008.03414v1
- Date: Sat, 8 Aug 2020 01:51:08 GMT
- Title: Using UNet and PSPNet to explore the reusability principle of CNN
parameters
- Authors: Wei Wang
- Abstract summary: Reusability of parameters in each layer of a deep convolutional neural network is experimentally quantified.
Running mean and running variance plays an important role than Weight and Bias in BN layer.
The bias in Convolution layers are not sensitive, and it can be reused directly.
- Score: 5.623232537411766
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: How to reduce the requirement on training dataset size is a hot topic in deep
learning community. One straightforward way is to reuse some pre-trained
parameters. Some previous work like Deep transfer learning reuse the model
parameters trained for the first task as the starting point for the second
task, and semi-supervised learning is trained upon a combination of labeled and
unlabeled data. However, the fundamental reason of the success of these methods
is unclear. In this paper, the reusability of parameters in each layer of a
deep convolutional neural network is experimentally quantified by using a
network to do segmentation and auto-encoder task. This paper proves that
network parameters can be reused for two reasons: first, the network features
are general; Second, there is little difference between the pre-trained
parameters and the ideal network parameters. Through the use of parameter
replacement and comparison, we demonstrate that reusability is different in
BN(Batch Normalization)[7] layer and Convolution layer and some observations:
(1)Running mean and running variance plays an important role than Weight and
Bias in BN layer.(2)The weight and bias can be reused in BN layers.( 3) The
network is very sensitive to the weight of convolutional layer.(4) The bias in
Convolution layers are not sensitive, and it can be reused directly.
Related papers
- NIDS Neural Networks Using Sliding Time Window Data Processing with Trainable Activations and its Generalization Capability [0.0]
This paper presents neural networks for network intrusion detection systems (NIDS) that operate on flow data preprocessed with a time window.
It requires only eleven features which do not rely on deep packet inspection and can be found in most NIDS datasets and easily obtained from conventional flow collectors.
The reported training accuracy exceeds 99% for the proposed method with as little as twenty neural network input features.
arXiv Detail & Related papers (2024-10-24T11:36:19Z) - Just How Flexible are Neural Networks in Practice? [89.80474583606242]
It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters.
In practice, however, we only find solutions via our training procedure, including the gradient and regularizers, limiting flexibility.
arXiv Detail & Related papers (2024-06-17T12:24:45Z) - Weight Compander: A Simple Weight Reparameterization for Regularization [5.744133015573047]
We introduce weight compander, a novel effective method to improve generalization of deep neural networks.
We show experimentally that using weight compander in addition to standard regularization methods improves the performance of neural networks.
arXiv Detail & Related papers (2023-06-29T14:52:04Z) - Diffused Redundancy in Pre-trained Representations [98.55546694886819]
We take a closer look at how features are encoded in pre-trained representations.
We find that learned representations in a given layer exhibit a degree of diffuse redundancy.
Our findings shed light on the nature of representations learned by pre-trained deep neural networks.
arXiv Detail & Related papers (2023-05-31T21:00:50Z) - Neural networks trained with SGD learn distributions of increasing
complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics.
We then exploit higher-order statistics only later during training.
We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z) - An Experimental Study of the Impact of Pre-training on the Pruning of a
Convolutional Neural Network [0.0]
In recent years, deep neural networks have known a wide success in various application domains.
Deep neural networks usually involve a large number of parameters, which correspond to the weights of the network.
The pruning methods notably attempt to reduce the size of the parameter set, by identifying and removing the irrelevant weights.
arXiv Detail & Related papers (2021-12-15T16:02:15Z) - MimicNorm: Weight Mean and Last BN Layer Mimic the Dynamic of Batch
Normalization [60.36100335878855]
We propose a novel normalization method, named MimicNorm, to improve the convergence and efficiency in network training.
We leverage the neural kernel (NTK) theory to prove that our weight mean operation whitens activations and transits network into the chaotic regime like BN layer.
MimicNorm achieves similar accuracy for various network structures, including ResNets and lightweight networks like ShuffleNet, with a reduction of about 20% memory consumption.
arXiv Detail & Related papers (2020-10-19T07:42:41Z) - Exploring the parameter reusability of CNN [12.654187477646449]
We propose a solution that can judge whether a given network is reusable or not based on the performance of reusing convolution kernels.
We define that the success of a CNN's parameter reuse depends upon two conditions: first, the network is a reusable network; and second, the RMSE between the convolution kernels from the source domain and target domain is small enough.
arXiv Detail & Related papers (2020-08-08T01:23:22Z) - Pre-Trained Models for Heterogeneous Information Networks [57.78194356302626]
We propose a self-supervised pre-training and fine-tuning framework, PF-HIN, to capture the features of a heterogeneous information network.
PF-HIN consistently and significantly outperforms state-of-the-art alternatives on each of these tasks, on four datasets.
arXiv Detail & Related papers (2020-07-07T03:36:28Z) - Side-Tuning: A Baseline for Network Adaptation via Additive Side
Networks [95.51368472949308]
Adaptation can be useful in cases when training data is scarce, or when one wishes to encode priors in the network.
In this paper, we propose a straightforward alternative: side-tuning.
arXiv Detail & Related papers (2019-12-31T18:52:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.