Related papers: A new initialisation to Control Gradients in Sinusoidal Neural network

A new initialisation to Control Gradients in Sinusoidal Neural network

URL: http://arxiv.org/abs/2512.06427v1
Date: Sat, 06 Dec 2025 13:23:03 GMT
Title: A new initialisation to Control Gradients in Sinusoidal Neural network
Authors: Andrea Combette, Antoine Venaille, Nelly Pustelnik,
Abstract summary: We propose a new initialisation for networks with sinusoidal activation functions such as textttSIREN.<n> Controlling both gradients and targeting vanishing pre-activation helps preventing the emergence of inappropriate frequencies during estimation.<n>New initialisation consistently outperforms state-of-the-art methods across a wide range of reconstruction tasks.
Score: 9.341735544356167
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Proper initialisation strategy is of primary importance to mitigate gradient explosion or vanishing when training neural networks. Yet, the impact of initialisation parameters still lacks a precise theoretical understanding for several well-established architectures. Here, we propose a new initialisation for networks with sinusoidal activation functions such as \texttt{SIREN}, focusing on gradients control, their scaling with network depth, their impact on training and on generalization. To achieve this, we identify a closed-form expression for the initialisation of the parameters, differing from the original \texttt{SIREN} scheme. This expression is derived from fixed points obtained through the convergence of pre-activation distribution and the variance of Jacobian sequences. Controlling both gradients and targeting vanishing pre-activation helps preventing the emergence of inappropriate frequencies during estimation, thereby improving generalization. We further show that this initialisation strongly influences training dynamics through the Neural Tangent Kernel framework (NTK). Finally, we benchmark \texttt{SIREN} with the proposed initialisation against the original scheme and other baselines on function fitting and image reconstruction. The new initialisation consistently outperforms state-of-the-art methods across a wide range of reconstruction tasks, including those involving physics-informed neural networks.

Related papers

A Theory of How Pretraining Shapes Inductive Bias in Fine-Tuning [51.505728136705564]
We develop an analytical theory of the pretraining-fine-tuning pipeline in diagonal linear networks.<n>We find that different initialization choices place the network into four distinct fine-tuning regimes.<n>A smaller scale in earlier layers enables the network to both reuse and refine its features, leading to superior generalization.
arXiv Detail & Related papers (2026-02-23T17:19:33Z)
Path-conditioned training: a principled way to rescale ReLU neural networks [15.875889029027915]
We build on the recent path-lifting framework, which provides a compact factorization of ReLU networks.<n>We introduce a geometrically motivated criterion to rescale neural network parameters.<n>We derive an efficient algorithm to perform this alignment.
arXiv Detail & Related papers (2026-02-23T12:55:48Z)
Neural Collapse under Gradient Flow on Shallow ReLU Networks for Orthogonally Separable Data [52.737775129027575]
We show that gradient flow on a two-layer ReLU network for classifying orthogonally separable data provably exhibits Neural Collapse (NC)<n>We reveal the role of the implicit bias of the training dynamics in facilitating the emergence of NC.
arXiv Detail & Related papers (2025-10-24T01:36:19Z)
Optimized Weight Initialization on the Stiefel Manifold for Deep ReLU Neural Networks [5.363441578662801]
Improper weight training of ReLU networks can cause inactivation dying ReLU and exacerbate instability as network depth increases.<n>We introduce an optimization problem on the Stiefel manifold, thereby preserving scale and calibrating the pre-activation statistics.<n>We show that prevention of the dying ReLU problem, slower decay of activation variance, and mitigation of gradient vanishing, which together stabilize signal and gradient flow in deep architectures.
arXiv Detail & Related papers (2025-08-30T05:17:31Z)
On the Impact of Overparameterization on the Training of a Shallow Neural Network in High Dimensions [0.0]
We study the training dynamics of a shallow neural network with quadratic activation functions and quadratic cost. In line with previous works on the same neural architecture, the optimization is performed following the gradient flow on the population risk.
arXiv Detail & Related papers (2023-11-07T08:20:31Z)
Globally Optimal Training of Neural Networks with Threshold Activation Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations. We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z)
Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights. We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z)
Simple initialization and parametrization of sinusoidal networks via their kernel bandwidth [92.25666446274188]
sinusoidal neural networks with activations have been proposed as an alternative to networks with traditional activation functions. We first propose a simplified version of such sinusoidal neural networks, which allows both for easier practical implementation and simpler theoretical analysis. We then analyze the behavior of these networks from the neural tangent kernel perspective and demonstrate that their kernel approximates a low-pass filter with an adjustable bandwidth.
arXiv Detail & Related papers (2022-11-26T07:41:48Z)
Training Integrable Parameterizations of Deep Neural Networks in the Infinite-Width Limit [0.0]
Large-width dynamics has emerged as a fruitful viewpoint and led to practical insights on real-world deep networks. For two-layer neural networks, it has been understood that the nature of the trained model radically changes depending on the scale of the initial random weights. We propose various methods to avoid this trivial behavior and analyze in detail the resulting dynamics.
arXiv Detail & Related papers (2021-10-29T07:53:35Z)
The Impact of Reinitialization on Generalization in Convolutional Neural Networks [3.462210753108297]
We study the impact of different reinitialization methods in several convolutional architectures across 12 benchmark image classification datasets. We introduce a new layerwise reinitialization algorithm that outperforms previous methods. Our takeaway message is that the accuracy of convolutional neural networks can be improved for small datasets using bottom-up layerwise reinitialization.
arXiv Detail & Related papers (2021-09-01T09:25:57Z)
On the Explicit Role of Initialization on the Convergence and Implicit Bias of Overparametrized Linear Networks [1.0323063834827415]
We present a novel analysis of single-hidden-layer linear networks trained under gradient flow. We show that the squared loss converges exponentially to its optimum. We derive a novel non-asymptotic upper-bound on the distance between the trained network and the min-norm solution.
arXiv Detail & Related papers (2021-05-13T15:13:51Z)
Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix. Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z)
MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks. The use of gradient combined nonvolutionity renders learning susceptible to novel problems. We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.