Related papers: The Impact of Reinitialization on Generalization in Convolutional Neural Networks

The Impact of Reinitialization on Generalization in Convolutional Neural Networks

URL: http://arxiv.org/abs/2109.00267v1
Date: Wed, 1 Sep 2021 09:25:57 GMT
Title: The Impact of Reinitialization on Generalization in Convolutional Neural Networks
Authors: Ibrahim Alabdulmohsin, Hartmut Maennel, Daniel Keysers
Abstract summary: We study the impact of different reinitialization methods in several convolutional architectures across 12 benchmark image classification datasets. We introduce a new layerwise reinitialization algorithm that outperforms previous methods. Our takeaway message is that the accuracy of convolutional neural networks can be improved for small datasets using bottom-up layerwise reinitialization.
Score: 3.462210753108297
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent results suggest that reinitializing a subset of the parameters of a neural network during training can improve generalization, particularly for small training sets. We study the impact of different reinitialization methods in several convolutional architectures across 12 benchmark image classification datasets, analyzing their potential gains and highlighting limitations. We also introduce a new layerwise reinitialization algorithm that outperforms previous methods and suggest explanations of the observed improved generalization. First, we show that layerwise reinitialization increases the margin on the training examples without increasing the norm of the weights, hence leading to an improvement in margin-based generalization bounds for neural networks. Second, we demonstrate that it settles in flatter local minima of the loss surface. Third, it encourages learning general rules and discourages memorization by placing emphasis on the lower layers of the neural network. Our takeaway message is that the accuracy of convolutional neural networks can be improved for small datasets using bottom-up layerwise reinitialization, where the number of reinitialized layers may vary depending on the available compute budget.

Related papers

Improving Generalization of Deep Neural Networks by Optimum Shifting [33.092571599896814]
We propose a novel method called emphoptimum shifting, which changes the parameters of a neural network from a sharp minimum to a flatter one. Our method is based on the observation that when the input and output of a neural network are fixed, the matrix multiplications within the network can be treated as systems of under-determined linear equations.
arXiv Detail & Related papers (2024-05-23T02:31:55Z)
Reparameterization through Spatial Gradient Scaling [69.27487006953852]
Reparameterization aims to improve the generalization of deep neural networks by transforming convolutional layers into equivalent multi-branched structures during training. We present a novel spatial gradient scaling method to redistribute learning focus among weights in convolutional networks.
arXiv Detail & Related papers (2023-03-05T17:57:33Z)
Imbedding Deep Neural Networks [0.0]
Continuous depth neural networks, such as Neural ODEs, have refashioned the understanding of residual neural networks in terms of non-linear vector-valued optimal control problems. We propose a new approach which explicates the network's depth' as a fundamental variable, thus reducing the problem to a system of forward-facing initial value problems.
arXiv Detail & Related papers (2022-01-31T22:00:41Z)
With Greater Distance Comes Worse Performance: On the Perspective of Layer Utilization and Model Generalization [3.6321778403619285]
Generalization of deep neural networks remains one of the main open problems in machine learning. Early layers generally learn representations relevant to performance on both training data and testing data. Deeper layers only minimize training risks and fail to generalize well with testing or mislabeled data.
arXiv Detail & Related papers (2022-01-28T05:26:32Z)
Subquadratic Overparameterization for Shallow Neural Networks [60.721751363271146]
We provide an analytical framework that allows us to adopt standard neural training strategies. We achieve the desiderata viaak-Lojasiewicz, smoothness, and standard assumptions.
arXiv Detail & Related papers (2021-11-02T20:24:01Z)
A Weight Initialization Based on the Linear Product Structure for Neural Networks [0.0]
We study neural networks from a nonlinear point of view and propose a novel weight initialization strategy that is based on the linear product structure (LPS) of neural networks. The proposed strategy is derived from the approximation of activation functions by using theories of numerical algebra to guarantee to find all the local minima.
arXiv Detail & Related papers (2021-09-01T00:18:59Z)
Backward Gradient Normalization in Deep Neural Networks [68.8204255655161]
We introduce a new technique for gradient normalization during neural network training. The gradients are rescaled during the backward pass using normalization layers introduced at certain points within the network architecture. Results on tests with very deep neural networks show that the new technique can do an effective control of the gradient norm.
arXiv Detail & Related papers (2021-06-17T13:24:43Z)
Analytically Tractable Inference in Deep Neural Networks [0.0]
Tractable Approximate Inference (TAGI) algorithm was shown to be a viable and scalable alternative to backpropagation for shallow fully-connected neural networks. We are demonstrating how TAGI matches or exceeds the performance of backpropagation, for training classic deep neural network architectures.
arXiv Detail & Related papers (2021-03-09T14:51:34Z)
Optimization Theory for ReLU Neural Networks Trained with Normalization Layers [82.61117235807606]
The success of deep neural networks in part due to the use of normalization layers. Our analysis shows how the introduction of normalization changes the landscape and can enable faster activation.
arXiv Detail & Related papers (2020-06-11T23:55:54Z)
Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix. Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z)
MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks. The use of gradient combined nonvolutionity renders learning susceptible to novel problems. We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.