Related papers: Depth-Aware Initialization for Stable and Efficient Neural Network Training

Depth-Aware Initialization for Stable and Efficient Neural Network Training

URL: http://arxiv.org/abs/2509.05018v1
Date: Fri, 05 Sep 2025 11:26:20 GMT
Title: Depth-Aware Initialization for Stable and Efficient Neural Network Training
Authors: Vijay Pandey,
Abstract summary: In this paper, study has been done where depth information of each layer as well as total network is incorporated for better scheme.<n>We proposed a novel way to increase the variance of the network in flexible manner, which incorporates the information of each layer depth.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: In past few years, various initialization schemes have been proposed. These schemes are glorot initialization, He initialization, initialization using orthogonal matrix, random walk method for initialization. Some of these methods stress on keeping unit variance of activation and gradient propagation through the network layer. Few of these methods are independent of the depth information while some methods has considered the total network depth for better initialization. In this paper, comprehensive study has been done where depth information of each layer as well as total network is incorporated for better initialization scheme. It has also been studied that for deeper networks theoretical assumption of unit variance throughout the network does not perform well. It requires the need to increase the variance of the network from first layer activation to last layer activation. We proposed a novel way to increase the variance of the network in flexible manner, which incorporates the information of each layer depth. Experiments shows that proposed method performs better than the existing initialization scheme.

Related papers

Optimized Weight Initialization on the Stiefel Manifold for Deep ReLU Neural Networks [5.363441578662801]
Improper weight training of ReLU networks can cause inactivation dying ReLU and exacerbate instability as network depth increases.<n>We introduce an optimization problem on the Stiefel manifold, thereby preserving scale and calibrating the pre-activation statistics.<n>We show that prevention of the dying ReLU problem, slower decay of activation variance, and mitigation of gradient vanishing, which together stabilize signal and gradient flow in deep architectures.
arXiv Detail & Related papers (2025-08-30T05:17:31Z)
Robust Weight Initialization for Tanh Neural Networks with Fixed Point Analysis [5.016205338484259]
As a neural network's depth increases, it can improve generalization performance.<n>This paper presents a novel weight initialization method for neural networks with tanh activation function.<n> Experiments on various classification datasets and physics-informed neural networks demonstrate that the proposed method outperforms Xavier methods(with or without normalization) in terms of robustness across different network sizes.
arXiv Detail & Related papers (2024-10-03T06:30:27Z)
Sparser, Better, Deeper, Stronger: Improving Sparse Training with Exact Orthogonal Initialization [49.06421851486415]
Static sparse training aims to train sparse models from scratch, achieving remarkable results in recent years. We propose Exact Orthogonal Initialization (EOI), a novel sparse Orthogonal Initialization scheme based on random Givens rotations. Our method enables training highly sparse 1000-layer and CNN networks without residual connections or normalization techniques.
arXiv Detail & Related papers (2024-06-03T19:44:47Z)
Unsupervised Learning of Initialization in Deep Neural Networks via Maximum Mean Discrepancy [74.34895342081407]
We propose an unsupervised algorithm to find good initialization for input data. We first notice that each parameter configuration in the parameter space corresponds to one particular downstream task of d-way classification. We then conjecture that the success of learning is directly related to how diverse downstream tasks are in the vicinity of the initial parameters.
arXiv Detail & Related papers (2023-02-08T23:23:28Z)
Data-driven Weight Initialization with Sylvester Solvers [72.11163104763071]
We propose a data-driven scheme to initialize the parameters of a deep neural network. We show that our proposed method is especially effective in few-shot and fine-tuning settings.
arXiv Detail & Related papers (2021-05-02T07:33:16Z)
Solving Sparse Linear Inverse Problems in Communication Systems: A Deep Learning Approach With Adaptive Depth [51.40441097625201]
We propose an end-to-end trainable deep learning architecture for sparse signal recovery problems. The proposed method learns how many layers to execute to emit an output, and the network depth is dynamically adjusted for each task in the inference phase.
arXiv Detail & Related papers (2020-10-29T06:32:53Z)
Deep Networks from the Principle of Rate Reduction [32.87280757001462]
This work attempts to interpret modern deep (convolutional) networks from the principles of rate reduction and (shift) invariant classification. We show that the basic iterative ascent gradient scheme for optimizing the rate reduction of learned features naturally leads to a multi-layer deep network, one iteration per layer. All components of this "white box" network have precise optimization, statistical, and geometric interpretation.
arXiv Detail & Related papers (2020-10-27T06:01:43Z)
An Effective and Efficient Initialization Scheme for Training Multi-layer Feedforward Neural Networks [5.161531917413708]
We propose a novel network initialization scheme based on the celebrated Stein's identity. A proposed SteinGLM method is shown through extensive numerical results to be much faster and more accurate than other popular methods commonly used for training neural networks.
arXiv Detail & Related papers (2020-05-16T16:17:37Z)
Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix. Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z)
MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks. The use of gradient combined nonvolutionity renders learning susceptible to novel problems. We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.