Sensitivity-Based Layer Insertion for Residual and Feedforward Neural
Networks
- URL: http://arxiv.org/abs/2311.15995v1
- Date: Mon, 27 Nov 2023 16:44:13 GMT
- Title: Sensitivity-Based Layer Insertion for Residual and Feedforward Neural
Networks
- Authors: Evelyn Herberg and Roland Herzog and Frederik K\"ohne and Leonie Kreis
and Anton Schiela
- Abstract summary: Training of neural networks requires tedious and often manual tuning of the network architecture.
We propose a systematic method to insert new layers during the training process, which eliminates the need to choose a fixed network size before training.
- Score: 0.3831327965422187
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The training of neural networks requires tedious and often manual tuning of
the network architecture. We propose a systematic method to insert new layers
during the training process, which eliminates the need to choose a fixed
network size before training. Our technique borrows techniques from constrained
optimization and is based on first-order sensitivity information of the
objective with respect to the virtual parameters that additional layers, if
inserted, would offer. We consider fully connected feedforward networks with
selected activation functions as well as residual neural networks. In numerical
experiments, the proposed sensitivity-based layer insertion technique exhibits
improved training decay, compared to not inserting the layer. Furthermore, the
computational effort is reduced in comparison to inserting the layer from the
beginning. The code is available at
\url{https://github.com/LeonieKreis/layer_insertion_sensitivity_based}.
Related papers
- Opening the Black Box: predicting the trainability of deep neural networks with reconstruction entropy [0.0]
We present a method for predicting the trainable regime in parameter space for deep feedforward neural networks.
For both the MNIST and CIFAR10 datasets, we show that a single epoch of training is sufficient to predict the trainability of the deep feedforward network.
arXiv Detail & Related papers (2024-06-13T18:00:05Z) - Stitching for Neuroevolution: Recombining Deep Neural Networks without Breaking Them [0.0]
Traditional approaches to neuroevolution often start from scratch.
Recombining trained networks is non-trivial because architectures and feature representations typically differ.
We employ stitching, which merges the networks by introducing new layers at crossover points.
arXiv Detail & Related papers (2024-03-21T08:30:44Z) - NeuralFastLAS: Fast Logic-Based Learning from Raw Data [54.938128496934695]
Symbolic rule learners generate interpretable solutions, however they require the input to be encoded symbolically.
Neuro-symbolic approaches overcome this issue by mapping raw data to latent symbolic concepts using a neural network.
We introduce NeuralFastLAS, a scalable and fast end-to-end approach that trains a neural network jointly with a symbolic learner.
arXiv Detail & Related papers (2023-10-08T12:33:42Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Improving the Trainability of Deep Neural Networks through Layerwise
Batch-Entropy Regularization [1.3999481573773072]
We introduce and evaluate the batch-entropy which quantifies the flow of information through each layer of a neural network.
We show that we can train a "vanilla" fully connected network and convolutional neural network with 500 layers by simply adding the batch-entropy regularization term to the loss function.
arXiv Detail & Related papers (2022-08-01T20:31:58Z) - Neural Maximum A Posteriori Estimation on Unpaired Data for Motion
Deblurring [87.97330195531029]
We propose a Neural Maximum A Posteriori (NeurMAP) estimation framework for training neural networks to recover blind motion information and sharp content from unpaired data.
The proposed NeurMAP is an approach to existing deblurring neural networks, and is the first framework that enables training image deblurring networks on unpaired datasets.
arXiv Detail & Related papers (2022-04-26T08:09:47Z) - Backward Gradient Normalization in Deep Neural Networks [68.8204255655161]
We introduce a new technique for gradient normalization during neural network training.
The gradients are rescaled during the backward pass using normalization layers introduced at certain points within the network architecture.
Results on tests with very deep neural networks show that the new technique can do an effective control of the gradient norm.
arXiv Detail & Related papers (2021-06-17T13:24:43Z) - Fast Adaptation with Linearized Neural Networks [35.43406281230279]
We study the inductive biases of linearizations of neural networks, which we show to be surprisingly good summaries of the full network functions.
Inspired by this finding, we propose a technique for embedding these inductive biases into Gaussian processes through a kernel designed from the Jacobian of the network.
In this setting, domain adaptation takes the form of interpretable posterior inference, with accompanying uncertainty estimation.
arXiv Detail & Related papers (2021-03-02T03:23:03Z) - Local Critic Training for Model-Parallel Learning of Deep Neural
Networks [94.69202357137452]
We propose a novel model-parallel learning method, called local critic training.
We show that the proposed approach successfully decouples the update process of the layer groups for both convolutional neural networks (CNNs) and recurrent neural networks (RNNs)
We also show that trained networks by the proposed method can be used for structural optimization.
arXiv Detail & Related papers (2021-02-03T09:30:45Z) - MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks.
The use of gradient combined nonvolutionity renders learning susceptible to novel problems.
We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.