Optimizing Performance of Feedforward and Convolutional Neural Networks
through Dynamic Activation Functions
- URL: http://arxiv.org/abs/2308.05724v2
- Date: Sun, 18 Feb 2024 21:53:54 GMT
- Title: Optimizing Performance of Feedforward and Convolutional Neural Networks
through Dynamic Activation Functions
- Authors: Chinmay Rane, Kanishka Tyagi, Michael Manry
- Abstract summary: Deep learning training algorithms are a huge success in recent years in many fields including speech, text,image video etc.
Deep and deeper layers are proposed with huge success with resnet structures having around 152 layers.
Shallow convolution neural networks(CNN's) are still an active research, where some phenomena are still unexplained.
- Score: 0.46040036610482665
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning training training algorithms are a huge success in recent years
in many fields including speech, text,image video etc. Deeper and deeper layers
are proposed with huge success with resnet structures having around 152 layers.
Shallow convolution neural networks(CNN's) are still an active research, where
some phenomena are still unexplained. Activation functions used in the network
are of utmost importance, as they provide non linearity to the networks. Relu's
are the most commonly used activation function.We show a complex piece-wise
linear(PWL) activation in the hidden layer. We show that these PWL activations
work much better than relu activations in our networks for convolution neural
networks and multilayer perceptrons. Result comparison in PyTorch for shallow
and deep CNNs are given to further strengthen our case.
Related papers
- Activations Through Extensions: A Framework To Boost Performance Of Neural Networks [6.302159507265204]
Activation functions are non-linearities in neural networks that allow them to learn complex mapping between inputs and outputs.
We propose a framework that unifies several works on activation functions and theoretically explains the performance benefits of these works.
arXiv Detail & Related papers (2024-08-07T07:36:49Z) - Fully Spiking Actor Network with Intra-layer Connections for
Reinforcement Learning [51.386945803485084]
We focus on the task where the agent needs to learn multi-dimensional deterministic policies to control.
Most existing spike-based RL methods take the firing rate as the output of SNNs, and convert it to represent continuous action space (i.e., the deterministic policy) through a fully-connected layer.
To develop a fully spiking actor network without any floating-point matrix operations, we draw inspiration from the non-spiking interneurons found in insects.
arXiv Detail & Related papers (2024-01-09T07:31:34Z) - Network Degeneracy as an Indicator of Training Performance: Comparing
Finite and Infinite Width Angle Predictions [3.04585143845864]
We show that as networks get deeper and deeper, they are more susceptible to becoming degenerate.
We use a simple algorithm that can accurately predict the level of degeneracy for any given fully connected ReLU network architecture.
arXiv Detail & Related papers (2023-06-02T13:02:52Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Most Activation Functions Can Win the Lottery Without Excessive Depth [6.68999512375737]
Lottery ticket hypothesis has highlighted the potential for training deep neural networks by pruning.
For networks with ReLU activation functions, it has been proven that a target network with depth $L$ can be approximated by the subnetwork of a randomly neural network that has double the target's depth $2L$ and is wider by a logarithmic factor.
arXiv Detail & Related papers (2022-05-04T20:51:30Z) - Optimal Learning Rates of Deep Convolutional Neural Networks: Additive
Ridge Functions [19.762318115851617]
We consider the mean squared error analysis for deep convolutional neural networks.
We show that, for additive ridge functions, convolutional neural networks followed by one fully connected layer with ReLU activation functions can reach optimal mini-max rates.
arXiv Detail & Related papers (2022-02-24T14:22:32Z) - Layer Folding: Neural Network Depth Reduction using Activation
Linearization [0.0]
Modern devices exhibit a high level of parallelism, but real-time latency is still highly dependent on networks' depth.
We propose a method that learns whether non-linear activations can be removed, allowing to fold consecutive linear layers into one.
We apply our method to networks pre-trained on CIFAR-10 and CIFAR-100 and find that they can all be transformed into shallower forms that share a similar depth.
arXiv Detail & Related papers (2021-06-17T08:22:46Z) - Incremental Training of a Recurrent Neural Network Exploiting a
Multi-Scale Dynamic Memory [79.42778415729475]
We propose a novel incrementally trained recurrent architecture targeting explicitly multi-scale learning.
We show how to extend the architecture of a simple RNN by separating its hidden state into different modules.
We discuss a training algorithm where new modules are iteratively added to the model to learn progressively longer dependencies.
arXiv Detail & Related papers (2020-06-29T08:35:49Z) - Deep Polynomial Neural Networks [77.70761658507507]
$Pi$Nets are a new class of function approximators based on expansions.
$Pi$Nets produce state-the-art results in three challenging tasks, i.e. image generation, face verification and 3D mesh representation learning.
arXiv Detail & Related papers (2020-06-20T16:23:32Z) - Curriculum By Smoothing [52.08553521577014]
Convolutional Neural Networks (CNNs) have shown impressive performance in computer vision tasks such as image classification, detection, and segmentation.
We propose an elegant curriculum based scheme that smoothes the feature embedding of a CNN using anti-aliasing or low-pass filters.
As the amount of information in the feature maps increases during training, the network is able to progressively learn better representations of the data.
arXiv Detail & Related papers (2020-03-03T07:27:44Z) - Large-Scale Gradient-Free Deep Learning with Recursive Local
Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources.
Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize.
We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.