Activation function impact on Sparse Neural Networks
- URL: http://arxiv.org/abs/2010.05943v1
- Date: Mon, 12 Oct 2020 18:05:04 GMT
- Title: Activation function impact on Sparse Neural Networks
- Authors: Adam Dubowski
- Abstract summary: Sparse Evolutionary Training allows for significantly lower computational complexity when compared to fully connected models.
This research provides insights into the relationship between the activation function used and the network performance at various sparsity levels.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While the concept of a Sparse Neural Network has been researched for some
time, researchers have only recently made notable progress in the matter.
Techniques like Sparse Evolutionary Training allow for significantly lower
computational complexity when compared to fully connected models by reducing
redundant connections. That typically takes place in an iterative process of
weight creation and removal during network training. Although there have been
numerous approaches to optimize the redistribution of the removed weights,
there seems to be little or no study on the effect of activation functions on
the performance of the Sparse Networks. This research provides insights into
the relationship between the activation function used and the network
performance at various sparsity levels.
Related papers
- Quantifying Emergence in Neural Networks: Insights from Pruning and Training Dynamics [0.0]
Emergence, where complex behaviors develop from the interactions of simpler components within a network, plays a crucial role in enhancing capabilities.
We introduce a quantitative framework to measure emergence during the training process and examine its impact on network performance.
Our hypothesis posits that the degree of emergence, defined by the connectivity between active and inactive nodes, can predict the development of emergent behaviors in the network.
arXiv Detail & Related papers (2024-09-03T03:03:35Z) - Activity Sparsity Complements Weight Sparsity for Efficient RNN
Inference [2.0822643340897273]
We show that activity sparsity can compose multiplicatively with parameter sparsity in a recurrent neural network model.
We achieve up to $20times$ reduction of computation while maintaining perplexities below $60$ on the Penn Treebank language modeling task.
arXiv Detail & Related papers (2023-11-13T08:18:44Z) - Learning Discrete Weights and Activations Using the Local
Reparameterization Trick [21.563618480463067]
In computer vision and machine learning, a crucial challenge is to lower the computation and memory demands for neural network inference.
By binarizing the network weights and activations, one can significantly reduce computational complexity.
This leads to a more efficient neural network inference that can be deployed on low-resource devices.
arXiv Detail & Related papers (2023-07-04T12:27:10Z) - Solving Large-scale Spatial Problems with Convolutional Neural Networks [88.31876586547848]
We employ transfer learning to improve training efficiency for large-scale spatial problems.
We propose that a convolutional neural network (CNN) can be trained on small windows of signals, but evaluated on arbitrarily large signals with little to no performance degradation.
arXiv Detail & Related papers (2023-06-14T01:24:42Z) - Stimulative Training++: Go Beyond The Performance Limits of Residual
Networks [91.5381301894899]
Residual networks have shown great success and become indispensable in recent deep neural network models.
Previous research has suggested that residual networks can be considered as ensembles of shallow networks.
We identify a problem that is analogous to social loafing, whereworks within a residual network are prone to exert less effort when working as part of a group compared to working alone.
arXiv Detail & Related papers (2023-05-04T02:38:11Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - A Faster Approach to Spiking Deep Convolutional Neural Networks [0.0]
Spiking neural networks (SNNs) have closer dynamics to the brain than current deep neural networks.
We propose a network structure based on previous work to improve network runtime and accuracy.
arXiv Detail & Related papers (2022-10-31T16:13:15Z) - Implicit recurrent networks: A novel approach to stationary input
processing with recurrent neural networks in deep learning [0.0]
In this work, we introduce and test a novel implementation of recurrent neural networks into deep learning.
We provide an algorithm which implements the backpropagation algorithm on a implicit implementation of recurrent networks.
A single-layer implicit recurrent network is able to solve the XOR problem, while a feed-forward network with monotonically increasing activation function fails at this task.
arXiv Detail & Related papers (2020-10-20T18:55:32Z) - Fitting the Search Space of Weight-sharing NAS with Graph Convolutional
Networks [100.14670789581811]
We train a graph convolutional network to fit the performance of sampled sub-networks.
With this strategy, we achieve a higher rank correlation coefficient in the selected set of candidates.
arXiv Detail & Related papers (2020-04-17T19:12:39Z) - Understanding the Effects of Data Parallelism and Sparsity on Neural
Network Training [126.49572353148262]
We study two factors in neural network training: data parallelism and sparsity.
Despite their promising benefits, understanding of their effects on neural network training remains elusive.
arXiv Detail & Related papers (2020-03-25T10:49:22Z) - Large-Scale Gradient-Free Deep Learning with Recursive Local
Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources.
Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize.
We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.