Related papers: A Robust Initialization of Residual Blocks for Effective ResNet Training without Batch Normalization

A Robust Initialization of Residual Blocks for Effective ResNet Training without Batch Normalization

URL: http://arxiv.org/abs/2112.12299v1
Date: Thu, 23 Dec 2021 01:13:15 GMT
Title: A Robust Initialization of Residual Blocks for Effective ResNet Training without Batch Normalization
Authors: Enrico Civitelli, Alessio Sortino, Matteo Lapucci, Francesco Bagattini and Giulio Galvan
Abstract summary: Batch Normalization is an essential component of all state-of-the-art neural networks architectures. We show that weights initialization is key to train ResNet-like normalization-free networks. We show that this modified architecture achieves competitive results on CIFAR-10 without further regularization or algorithmic modifications.
Score: 0.9449650062296823
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Batch Normalization is an essential component of all state-of-the-art neural networks architectures. However, since it introduces many practical issues, much recent research has been devoted to designing normalization-free architectures. In this paper, we show that weights initialization is key to train ResNet-like normalization-free networks. In particular, we propose a slight modification to the summation operation of a block output to the skip connection branch, so that the whole network is correctly initialized. We show that this modified architecture achieves competitive results on CIFAR-10 without further regularization nor algorithmic modifications.

Related papers

NeRF-based CBCT Reconstruction needs Normalization and Initialization [53.58395475423445]
NeRF-based methods suffer from a local-global training mismatch between their two key components: the hash encoder and the neural network.<n>We introduce a Normalized Hash, which enhances feature consistency and mitigates the mismatch.<n>The neural network exhibits improved stability during early training, enabling faster convergence and enhanced reconstruction performance.
arXiv Detail & Related papers (2025-06-24T16:01:45Z)
Rotation Equivariant Proximal Operator for Deep Unfolding Methods in Image Restoration [62.41329042683779]
We propose a high-accuracy rotation equivariant proximal network that embeds rotation symmetry priors into the deep unfolding framework. This study makes efforts to suggest a high-accuracy rotation equivariant proximal network that effectively embeds rotation symmetry priors into the deep unfolding framework.
arXiv Detail & Related papers (2023-12-25T11:53:06Z)
Towards Architecture-Agnostic Untrained Network Priors for Image Reconstruction with Frequency Regularization [14.73423587548693]
We propose efficient architecture-agnostic techniques to directly modulate the spectral bias of network priors. We show that, with just a few lines of code, we can reduce overfitting in underperforming architectures and close performance gaps with high-performing counterparts. Results signify for the first time that architectural biases, overfitting, and runtime issues of untrained network priors can be simultaneously addressed without architectural modifications.
arXiv Detail & Related papers (2023-12-15T18:01:47Z)
Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components. CNNs are used to augment the local texture information of coarse priors. DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z)
Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures. This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead. We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z)
Dynamical Isometry for Residual Networks [8.21292084298669]
We show that RISOTTO achieves perfect dynamical isometry for residual networks with ReLU activation functions even for finite depth and width. In experiments, we demonstrate that our approach outperforms schemes proposed to make Batch Normalization obsolete, including Fixup and SkipInit.
arXiv Detail & Related papers (2022-10-05T17:33:23Z)
Deep Neural Networks pruning via the Structured Perspective Regularization [5.061851539114448]
In Machine Learning, Artificial Neural Networks (ANNs) are a very powerful tool, broadly used in many applications. One of the most popular compression approaches is emphpruning, whereby entire elements of the ANN (links, nodes, channels, ldots) and the corresponding weights are deleted. Since the nature of the problem is inherently (what elements to prune and what not), we propose a new pruning method based on Operational Research tools.
arXiv Detail & Related papers (2022-06-28T14:58:51Z)
AutoInit: Automatic Initialization via Jacobian Tuning [7.9603223299524535]
We introduce a new and cheap algorithm, that allows one to find a good initialization automatically, for general feed-forward DNNs. We solve the dynamics of the algorithm for fully connected networks with ReLU and derive conditions for its convergence. We apply our method to ResMLP and VGG architectures, where the automatic one-shot initialization found by our method shows good performance on vision tasks.
arXiv Detail & Related papers (2022-06-27T18:14:51Z)
Subquadratic Overparameterization for Shallow Neural Networks [60.721751363271146]
We provide an analytical framework that allows us to adopt standard neural training strategies. We achieve the desiderata viaak-Lojasiewicz, smoothness, and standard assumptions.
arXiv Detail & Related papers (2021-11-02T20:24:01Z)
ZerO Initialization: Initializing Residual Networks with only Zeros and Ones [44.66636787050788]
Deep neural networks are usually with random weights, with adequately selected initial variance to ensure stable signal propagation during training. There is no consensus on how to select the variance, and this becomes challenging as the number of layers grows. In this work, we replace the widely used random weight initialization with a fully deterministic initialization scheme ZerO, which initializes residual networks with only zeros and ones. Surprisingly, we find that ZerO achieves state-of-the-art performance over various image classification datasets, including ImageNet.
arXiv Detail & Related papers (2021-10-25T06:17:33Z)
Distance-Based Regularisation of Deep Networks for Fine-Tuning [116.71288796019809]
We develop an algorithm that constrains a hypothesis class to a small sphere centred on the initial pre-trained weights. Empirical evaluation shows that our algorithm works well, corroborating our theoretical results.
arXiv Detail & Related papers (2020-02-19T16:00:47Z)
BLK-REW: A Unified Block-based DNN Pruning Framework using Reweighted Regularization Method [69.49386965992464]
We propose a new block-based pruning framework that comprises a general and flexible structured pruning dimension as well as a powerful and efficient reweighted regularization method. Our framework is universal, which can be applied to both CNNs and RNNs, implying complete support for the two major kinds ofintensive computation layers. It is the first time that the weight pruning framework achieves universal coverage for both CNNs and RNNs with real-time mobile acceleration and no accuracy compromise.
arXiv Detail & Related papers (2020-01-23T03:30:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.