Advancing Neural Network Performance through Emergence-Promoting Initialization Scheme
- URL: http://arxiv.org/abs/2407.19044v2
- Date: Thu, 8 Aug 2024 21:46:07 GMT
- Title: Advancing Neural Network Performance through Emergence-Promoting Initialization Scheme
- Authors: Johnny Jingze Li, Vivek Kurien George, Gabriel A. Silva,
- Abstract summary: We introduce a novel yet straightforward neural network initialization scheme.
Inspired by the concept of emergence and leveraging the emergence measures proposed by Li (2023), our method adjusts layer-wise weight scaling factors to achieve higher emergence values.
We demonstrate substantial improvements in both model accuracy and training speed, with and without batch normalization.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce a novel yet straightforward neural network initialization scheme that modifies conventional methods like Xavier and Kaiming initialization. Inspired by the concept of emergence and leveraging the emergence measures proposed by Li (2023), our method adjusts the layer-wise weight scaling factors to achieve higher emergence values. This enhancement is easy to implement, requiring no additional optimization steps for initialization compared to GradInit. We evaluate our approach across various architectures, including MLP and convolutional architectures for image recognition, and transformers for machine translation. We demonstrate substantial improvements in both model accuracy and training speed, with and without batch normalization. The simplicity, theoretical innovation, and demonstrable empirical advantages of our method make it a potent enhancement to neural network initialization practices. These results suggest a promising direction for leveraging emergence to improve neural network training methodologies. Code is available at: https://github.com/johnnyjingzeli/EmergenceInit.
Related papers
- Novel Saliency Analysis for the Forward Forward Algorithm [0.0]
We introduce the Forward Forward algorithm into neural network training.
This method involves executing two forward passes the first with actual data to promote positive reinforcement, and the second with synthetically generated negative data to enable discriminative learning.
To overcome the limitations inherent in traditional saliency techniques, we developed a bespoke saliency algorithm specifically tailored for the Forward Forward framework.
arXiv Detail & Related papers (2024-09-18T17:21:59Z) - Adaptive Class Emergence Training: Enhancing Neural Network Stability and Generalization through Progressive Target Evolution [0.0]
We propose a novel training methodology for neural networks in classification problems.
We evolve the target outputs from a null vector to one-hot encoded vectors throughout the training process.
This gradual transition allows the network to adapt more smoothly to the increasing complexity of the classification task.
arXiv Detail & Related papers (2024-09-04T03:25:48Z) - Multiplicative update rules for accelerating deep learning training and
increasing robustness [69.90473612073767]
We propose an optimization framework that fits to a wide range of machine learning algorithms and enables one to apply alternative update rules.
We claim that the proposed framework accelerates training, while leading to more robust models in contrast to traditionally used additive update rule.
arXiv Detail & Related papers (2023-07-14T06:44:43Z) - Towards Theoretically Inspired Neural Initialization Optimization [66.04735385415427]
We propose a differentiable quantity, named GradCosine, with theoretical insights to evaluate the initial state of a neural network.
We show that both the training and test performance of a network can be improved by maximizing GradCosine under norm constraint.
Generalized from the sample-wise analysis into the real batch setting, NIO is able to automatically look for a better initialization with negligible cost.
arXiv Detail & Related papers (2022-10-12T06:49:16Z) - Subquadratic Overparameterization for Shallow Neural Networks [60.721751363271146]
We provide an analytical framework that allows us to adopt standard neural training strategies.
We achieve the desiderata viaak-Lojasiewicz, smoothness, and standard assumptions.
arXiv Detail & Related papers (2021-11-02T20:24:01Z) - GradInit: Learning to Initialize Neural Networks for Stable and
Efficient Training [59.160154997555956]
We present GradInit, an automated and architecture method for initializing neural networks.
It is based on a simple agnostic; the variance of each network layer is adjusted so that a single step of SGD or Adam results in the smallest possible loss value.
It also enables training the original Post-LN Transformer for machine translation without learning rate warmup.
arXiv Detail & Related papers (2021-02-16T11:45:35Z) - A Differential Game Theoretic Neural Optimizer for Training Residual
Networks [29.82841891919951]
We propose a generalized Differential Dynamic Programming (DDP) neural architecture that accepts both residual connections and convolution layers.
The resulting optimal control representation admits a gameoretic perspective, in which training residual networks can be interpreted as cooperative trajectory optimization on state-augmented systems.
arXiv Detail & Related papers (2020-07-17T10:19:17Z) - Dynamic Hierarchical Mimicking Towards Consistent Optimization
Objectives [73.15276998621582]
We propose a generic feature learning mechanism to advance CNN training with enhanced generalization ability.
Partially inspired by DSN, we fork delicately designed side branches from the intermediate layers of a given neural network.
Experiments on both category and instance recognition tasks demonstrate the substantial improvements of our proposed method.
arXiv Detail & Related papers (2020-03-24T09:56:13Z) - MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks.
The use of gradient combined nonvolutionity renders learning susceptible to novel problems.
We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.