Sinusoidal Initialization, Time for a New Start
- URL: http://arxiv.org/abs/2505.12909v2
- Date: Tue, 20 May 2025 15:54:36 GMT
- Title: Sinusoidal Initialization, Time for a New Start
- Authors: Alberto Fernández-Hernández, Jose I. Mestre, Manuel F. Dolz, Jose Duato, Enrique S. Quintana-Ortí,
- Abstract summary: Initialization plays a critical role in Deep Neural Network training, directly influencing convergence, stability, and generalization.<n>We introduce a novel deterministic method that employs sinusoidal functions to construct structured weight matrices to improve the spread and balance of weights throughout the network.<n>Our experiments show an increase of 4.9% in final validation accuracy and 20.9% in convergence speed.
- Score: 0.5242869847419834
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Initialization plays a critical role in Deep Neural Network training, directly influencing convergence, stability, and generalization. Common approaches such as Glorot and He initializations rely on randomness, which can produce uneven weight distributions across layer connections. In this paper, we introduce the Sinusoidal initialization, a novel deterministic method that employs sinusoidal functions to construct structured weight matrices expressly to improve the spread and balance of weights throughout the network while simultaneously fostering a more uniform, well-conditioned distribution of neuron activation states from the very first forward pass. Because Sinusoidal initialization begins with weights and activations that are already evenly and efficiently utilized, it delivers consistently faster convergence, greater training stability, and higher final accuracy across a wide range of models, including convolutional neural networks, vision transformers, and large language models. On average, our experiments show an increase of 4.9% in final validation accuracy and 20.9% in convergence speed. By replacing randomness with structure, this initialization provides a stronger and more reliable foundation for Deep Learning systems.
Related papers
- MILP initialization for solving parabolic PDEs with PINNs [2.5932373010465364]
Physics-Informed Neural Networks (PINNs) are a powerful deep learning method capable of providing solutions and parameter estimations of physical systems.<n>Given the complexity of their neural network structure, the convergence speed is still limited compared to numerical methods.
arXiv Detail & Related papers (2025-01-27T15:46:38Z) - Fast Training of Sinusoidal Neural Fields via Scaling Initialization [16.912112402718584]
We focus on a popular family of neural fields, called sinusoidal neural fields (SNFs)<n>We show that by simply multiplying each weight by a constant, we can accelerate SNF training by 10$times$.
arXiv Detail & Related papers (2024-10-07T06:38:43Z) - Tuning the Frequencies: Robust Training for Sinusoidal Neural Networks [1.5124439914522694]
We introduce a theoretical framework that explains the capacity property of sinusoidal networks.<n>We show how its layer compositions produce a large number of new frequencies expressed as integer combinations of the input frequencies.<n>Our method, referred to as TUNER, greatly improves the stability and convergence of sinusoidal INR training, leading to detailed reconstructions.
arXiv Detail & Related papers (2024-07-30T18:24:46Z) - Stabilizing RNN Gradients through Pre-training [3.335932527835653]
Theory of learning proposes to prevent the gradient from exponential growth with depth or time, to stabilize and improve training.
We extend known stability theories to encompass a broader family of deep recurrent networks, requiring minimal assumptions on data and parameter distribution.
We propose a new approach to mitigate this issue, that consists on giving a weight of a half to the time and depth contributions to the gradient.
arXiv Detail & Related papers (2023-08-23T11:48:35Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime.
We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK
We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z) - A Stable, Fast, and Fully Automatic Learning Algorithm for Predictive
Coding Networks [65.34977803841007]
Predictive coding networks are neuroscience-inspired models with roots in both Bayesian statistics and neuroscience.
We show how by simply changing the temporal scheduling of the update rule for the synaptic weights leads to an algorithm that is much more efficient and stable than the original one.
arXiv Detail & Related papers (2022-11-16T00:11:04Z) - Dynamic Neural Diversification: Path to Computationally Sustainable
Neural Networks [68.8204255655161]
Small neural networks with a constrained number of trainable parameters, can be suitable resource-efficient candidates for many simple tasks.
We explore the diversity of the neurons within the hidden layer during the learning process.
We analyze how the diversity of the neurons affects predictions of the model.
arXiv Detail & Related papers (2021-09-20T15:12:16Z) - Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix.
Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z) - Distance-Based Regularisation of Deep Networks for Fine-Tuning [116.71288796019809]
We develop an algorithm that constrains a hypothesis class to a small sphere centred on the initial pre-trained weights.
Empirical evaluation shows that our algorithm works well, corroborating our theoretical results.
arXiv Detail & Related papers (2020-02-19T16:00:47Z) - MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks.
The use of gradient combined nonvolutionity renders learning susceptible to novel problems.
We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z) - Provable Benefit of Orthogonal Initialization in Optimizing Deep Linear
Networks [39.856439772974454]
We show that the width needed for efficient convergence to a global minimum is independent of the depth.
Our results suggest an explanation for the recent empirical successes found by initializing very deep non-linear networks according to the principle of dynamical isometry.
arXiv Detail & Related papers (2020-01-16T18:48:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.