Related papers: IDInit: A Universal and Stable Initialization Method for Neural Network Training

IDInit: A Universal and Stable Initialization Method for Neural Network Training

URL: http://arxiv.org/abs/2503.04626v2
Date: Sun, 09 Mar 2025 16:31:31 GMT
Title: IDInit: A Universal and Stable Initialization Method for Neural Network Training
Authors: Yu Pan, Chaozheng Wang, Zekai Wu, Qifan Wang, Min Zhang, Zenglin Xu,
Abstract summary: Methods that maintain identity transition within layers have shown good efficiency in network training.<n>We introduce IDInit, a novel method that preserves identity in both the main and sub-stem layers of residual networks.
Score: 44.542599968374205
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep neural networks have achieved remarkable accomplishments in practice. The success of these networks hinges on effective initialization methods, which are vital for ensuring stable and rapid convergence during training. Recently, initialization methods that maintain identity transition within layers have shown good efficiency in network training. These techniques (e.g., Fixup) set specific weights to zero to achieve identity control. However, settings of remaining weight (e.g., Fixup uses random values to initialize non-zero weights) will affect the inductive bias that is achieved only by a zero weight, which may be harmful to training. Addressing this concern, we introduce fully identical initialization (IDInit), a novel method that preserves identity in both the main and sub-stem layers of residual networks. IDInit employs a padded identity-like matrix to overcome rank constraints in non-square weight matrices. Furthermore, we show the convergence problem of an identity matrix can be solved by stochastic gradient descent. Additionally, we enhance the universality of IDInit by processing higher-order weights and addressing dead neuron problems. IDInit is a straightforward yet effective initialization method, with improved convergence, stability, and performance across various settings, including large-scale datasets and deep models.

Related papers

Find A Winning Sign: Sign Is All We Need to Win the Lottery [52.63674911541416]
We show that a sparse network trained by an existing IP method can retain its basin of attraction if its parameter signs and normalization layer parameters are preserved. To take a step closer to finding a winning ticket, we alleviate the reliance on normalization layer parameters by preventing high error barriers along the linear path between the sparse network trained by our method and its counterpart with normalization layer parameters.
arXiv Detail & Related papers (2025-04-07T09:30:38Z)
Robust Weight Initialization for Tanh Neural Networks with Fixed Point Analysis [5.016205338484259]
As a neural network's depth increases, it can improve generalization performance.<n>This paper presents a novel weight initialization method for neural networks with tanh activation function.<n> Experiments on various classification datasets and physics-informed neural networks demonstrate that the proposed method outperforms Xavier methods(with or without normalization) in terms of robustness across different network sizes.
arXiv Detail & Related papers (2024-10-03T06:30:27Z)
Sparser, Better, Deeper, Stronger: Improving Sparse Training with Exact Orthogonal Initialization [49.06421851486415]
Static sparse training aims to train sparse models from scratch, achieving remarkable results in recent years. We propose Exact Orthogonal Initialization (EOI), a novel sparse Orthogonal Initialization scheme based on random Givens rotations. Our method enables training highly sparse 1000-layer and CNN networks without residual connections or normalization techniques.
arXiv Detail & Related papers (2024-06-03T19:44:47Z)
Using linear initialisation to improve speed of convergence and fully-trained error in Autoencoders [0.0]
We introduce a novel weight initialisation technique called the Straddled Matrix Initialiser. Combination of Straddled Matrix and ReLU activation function initialises a Neural Network as a de facto linear model. In all our experiments the Straddeled Matrix Initialiser clearly outperforms all other methods.
arXiv Detail & Related papers (2023-11-17T18:43:32Z)
From Pointwise to Powerhouse: Initialising Neural Networks with Generative Models [1.1807848705528714]
In this paper, we introduce two groups of new initialisation methods. First, we locally initialise weight groups by employing variational autoencoders. Secondly, we globally initialise full weight sets by employing graph hypernetworks.
arXiv Detail & Related papers (2023-10-25T15:06:32Z)
Globally Optimal Training of Neural Networks with Threshold Activation Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations. We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z)
Data-driven Weight Initialization with Sylvester Solvers [72.11163104763071]
We propose a data-driven scheme to initialize the parameters of a deep neural network. We show that our proposed method is especially effective in few-shot and fine-tuning settings.
arXiv Detail & Related papers (2021-05-02T07:33:16Z)
Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix. Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z)
MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks. The use of gradient combined nonvolutionity renders learning susceptible to novel problems. We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.