A note on regularised NTK dynamics with an application to PAC-Bayesian
training
- URL: http://arxiv.org/abs/2312.13259v1
- Date: Wed, 20 Dec 2023 18:36:05 GMT
- Title: A note on regularised NTK dynamics with an application to PAC-Bayesian
training
- Authors: Eugenio Clerico, Benjamin Guedj
- Abstract summary: We study the evolution of neural networks trained to optimise generalisation objectives such as PAC-Bayes bounds.
This setting provides an appropriate framework to study the evolution of wide networks trained to optimise generalisation objectives.
- Score: 18.97829627151844
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We establish explicit dynamics for neural networks whose training objective
has a regularising term that constrains the parameters to remain close to their
initial value. This keeps the network in a lazy training regime, where the
dynamics can be linearised around the initialisation. The standard neural
tangent kernel (NTK) governs the evolution during the training in the
infinite-width limit, although the regularisation yields an additional term
appears in the differential equation describing the dynamics. This setting
provides an appropriate framework to study the evolution of wide networks
trained to optimise generalisation objectives such as PAC-Bayes bounds, and
hence potentially contribute to a deeper theoretical understanding of such
networks.
Related papers
- Leveraging chaos in the training of artificial neural networks [3.379574469735166]
We explore the dynamics of the neural network trajectory along training for unconventionally large learning rates.<n>We show that for a region of values of the learning rate, the GD optimization shifts away from purely exploitation-like algorithm into a regime of exploration-exploitation balance.
arXiv Detail & Related papers (2025-06-10T07:41:58Z) - Parameter-Efficient Fine-Tuning for Continual Learning: A Neural Tangent Kernel Perspective [125.00228936051657]
We introduce NTK-CL, a novel framework that eliminates task-specific parameter storage while adaptively generating task-relevant features.
By fine-tuning optimizable parameters with appropriate regularization, NTK-CL achieves state-of-the-art performance on established PEFT-CL benchmarks.
arXiv Detail & Related papers (2024-07-24T09:30:04Z) - An Infinite-Width Analysis on the Jacobian-Regularised Training of a Neural Network [10.384951432591492]
Recent theoretical analysis of deep neural networks in their infinite-width limits has deepened our understanding of initialisation, feature learning, and training of those networks.
We show that this infinite-width analysis can be extended to the Jacobian of a deep neural network.
We experimentally show the relevance of our theoretical claims to wide finite networks, and empirically analyse the properties of kernel regression solution to obtain an insight into Jacobian regularisation.
arXiv Detail & Related papers (2023-12-06T09:52:18Z) - How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series.
We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z) - Identifying Equivalent Training Dynamics [3.793387630509845]
We develop a framework for identifying conjugate and non-conjugate training dynamics.
By leveraging advances in Koopman operator theory, we demonstrate that comparing Koopman eigenvalues can correctly identify a known equivalence between online mirror descent and online gradient descent.
We then utilize our approach to: (a) identify non-conjugate training dynamics between shallow and wide fully connected neural networks; (b) characterize the early phase of training dynamics in convolutional neural networks; (c) uncover non-conjugate training dynamics in Transformers that do and do not undergo grokking.
arXiv Detail & Related papers (2023-02-17T22:15:20Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Subquadratic Overparameterization for Shallow Neural Networks [60.721751363271146]
We provide an analytical framework that allows us to adopt standard neural training strategies.
We achieve the desiderata viaak-Lojasiewicz, smoothness, and standard assumptions.
arXiv Detail & Related papers (2021-11-02T20:24:01Z) - Edge of chaos as a guiding principle for modern neural network training [19.419382003562976]
We study the role of various hyperparameters in modern neural network training algorithms in terms of the order-chaos phase diagram.
In particular, we study a fully analytical feedforward neural network trained on the widely adopted Fashion-MNIST dataset.
arXiv Detail & Related papers (2021-07-20T12:17:55Z) - Developing Constrained Neural Units Over Time [81.19349325749037]
This paper focuses on an alternative way of defining Neural Networks, that is different from the majority of existing approaches.
The structure of the neural architecture is defined by means of a special class of constraints that are extended also to the interaction with data.
The proposed theory is cast into the time domain, in which data are presented to the network in an ordered manner.
arXiv Detail & Related papers (2020-09-01T09:07:25Z) - An Ode to an ODE [78.97367880223254]
We present a new paradigm for Neural ODE algorithms, called ODEtoODE, where time-dependent parameters of the main flow evolve according to a matrix flow on the group O(d)
This nested system of two flows provides stability and effectiveness of training and provably solves the gradient vanishing-explosion problem.
arXiv Detail & Related papers (2020-06-19T22:05:19Z) - On the Neural Tangent Kernel of Deep Networks with Orthogonal
Initialization [18.424756271923524]
We study the dynamics of ultra-wide networks across a range of architectures, including Fully Connected Networks (FCNs) and Convolutional Neural Networks (CNNs)
arXiv Detail & Related papers (2020-04-13T11:12:53Z) - Input-to-State Representation in linear reservoirs dynamics [15.491286626948881]
Reservoir computing is a popular approach to design recurrent neural networks.
The working principle of these networks is not fully understood.
A novel analysis of the dynamics of such networks is proposed.
arXiv Detail & Related papers (2020-03-24T00:14:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.