Adaptive Signal Variances: CNN Initialization Through Modern
Architectures
- URL: http://arxiv.org/abs/2008.06885v2
- Date: Sat, 29 Aug 2020 06:11:44 GMT
- Title: Adaptive Signal Variances: CNN Initialization Through Modern
Architectures
- Authors: Takahiko Henmi, Esmeraldo Ronnie Rey Zara, Yoshihiro Hirohashi,
Tsuyoshi Kato
- Abstract summary: Deep convolutional neural networks (CNN) have achieved the unwavering confidence in its performance on image processing tasks.
CNN practitioners widely understand the fact that the stability of learning depends on how to initialize the model parameters in each layer.
- Score: 0.7646713951724012
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep convolutional neural networks (CNN) have achieved the unwavering
confidence in its performance on image processing tasks. The CNN architecture
constitutes a variety of different types of layers including the convolution
layer and the max-pooling layer. CNN practitioners widely understand the fact
that the stability of learning depends on how to initialize the model
parameters in each layer. Nowadays, no one doubts that the de facto standard
scheme for initialization is the so-called Kaiming initialization that has been
developed by He et al. The Kaiming scheme was derived from a much simpler model
than the currently used CNN structure having evolved since the emergence of the
Kaiming scheme. The Kaiming model consists only of the convolution and fully
connected layers, ignoring the max-pooling layer and the global average pooling
layer. In this study, we derived the initialization scheme again not from the
simplified Kaiming model, but precisely from the modern CNN architectures, and
empirically investigated how the new initialization method performs compared to
the de facto standard ones that are widely used today.
Related papers
- Model Parallel Training and Transfer Learning for Convolutional Neural Networks by Domain Decomposition [0.0]
Deep convolutional neural networks (CNNs) have been shown to be very successful in a wide range of image processing applications.
Due to their increasing number of model parameters and an increasing availability of large amounts of training data, parallelization strategies to efficiently train complex CNNs are necessary.
arXiv Detail & Related papers (2024-08-26T17:35:01Z) - Principled Architecture-aware Scaling of Hyperparameters [69.98414153320894]
Training a high-quality deep neural network requires choosing suitable hyperparameters, which is a non-trivial and expensive process.
In this work, we precisely characterize the dependence of initializations and maximal learning rates on the network architecture.
We demonstrate that network rankings can be easily changed by better training networks in benchmarks.
arXiv Detail & Related papers (2024-02-27T11:52:49Z) - Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components.
CNNs are used to augment the local texture information of coarse priors.
DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z) - A Gradient Boosting Approach for Training Convolutional and Deep Neural
Networks [0.0]
We introduce two procedures for training Convolutional Neural Networks (CNNs) and Deep Neural Network based on Gradient Boosting (GB)
The presented models show superior performance in terms of classification accuracy with respect to standard CNN and Deep-NN with the same architectures.
arXiv Detail & Related papers (2023-02-22T12:17:32Z) - Effects of Architectures on Continual Semantic Segmentation [0.0]
We study how the choice of neural network architecture affects catastrophic forgetting in class- and domain-incremental semantic segmentation.
We find that traditional CNNs like ResNet have high plasticity but low stability, while transformer architectures are much more stable.
arXiv Detail & Related papers (2023-02-21T15:12:01Z) - Improved Convergence Guarantees for Shallow Neural Networks [91.3755431537592]
We prove convergence of depth 2 neural networks, trained via gradient descent, to a global minimum.
Our model has the following features: regression with quadratic loss function, fully connected feedforward architecture, RelU activations, Gaussian data instances, adversarial labels.
They strongly suggest that, at least in our model, the convergence phenomenon extends well beyond the NTK regime''
arXiv Detail & Related papers (2022-12-05T14:47:52Z) - GradInit: Learning to Initialize Neural Networks for Stable and
Efficient Training [59.160154997555956]
We present GradInit, an automated and architecture method for initializing neural networks.
It is based on a simple agnostic; the variance of each network layer is adjusted so that a single step of SGD or Adam results in the smallest possible loss value.
It also enables training the original Post-LN Transformer for machine translation without learning rate warmup.
arXiv Detail & Related papers (2021-02-16T11:45:35Z) - Convolutional Neural Network Simplification with Progressive Retraining [0.0]
Kernel pruning methods have been proposed to speed up, simplify, and improve explanation of convolutional neural network (CNN) models.
We present new methods based on objective and subjective relevance criteria for kernel elimination in a layer-by-layer fashion.
arXiv Detail & Related papers (2021-01-12T19:05:42Z) - ACDC: Weight Sharing in Atom-Coefficient Decomposed Convolution [57.635467829558664]
We introduce a structural regularization across convolutional kernels in a CNN.
We show that CNNs now maintain performance with dramatic reduction in parameters and computations.
arXiv Detail & Related papers (2020-09-04T20:41:47Z) - Eigen-CAM: Class Activation Map using Principal Components [1.2691047660244335]
This paper builds on previous ideas to cope with the increasing demand for interpretable, robust, and transparent models.
The proposed Eigen-CAM computes and visualizes the principle components of the learned features/representations from the convolutional layers.
arXiv Detail & Related papers (2020-08-01T17:14:13Z) - Dynamic Hierarchical Mimicking Towards Consistent Optimization
Objectives [73.15276998621582]
We propose a generic feature learning mechanism to advance CNN training with enhanced generalization ability.
Partially inspired by DSN, we fork delicately designed side branches from the intermediate layers of a given neural network.
Experiments on both category and instance recognition tasks demonstrate the substantial improvements of our proposed method.
arXiv Detail & Related papers (2020-03-24T09:56:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.