Improving Classification Neural Networks by using Absolute activation
function (MNIST/LeNET-5 example)
- URL: http://arxiv.org/abs/2304.11758v1
- Date: Sun, 23 Apr 2023 22:17:58 GMT
- Title: Improving Classification Neural Networks by using Absolute activation
function (MNIST/LeNET-5 example)
- Authors: Oleg I.Berngardt
- Abstract summary: It is shown that in deep networks Absolute activation does not cause vanishing and exploding gradients, and therefore Absolute activation can be used in both simple and deep neural networks.
It is shown that solving the MNIST problem with the LeNet-like architectures based on Absolute activation allows to significantly reduce the number of trained parameters in the neural network with improving the prediction accuracy.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The paper discusses the use of the Absolute activation function in
classification neural networks. An examples are shown of using this activation
function in simple and more complex problems. Using as a baseline LeNet-5
network for solving the MNIST problem, the efficiency of Absolute activation
function is shown in comparison with the use of Tanh, ReLU and SeLU
activations. It is shown that in deep networks Absolute activation does not
cause vanishing and exploding gradients, and therefore Absolute activation can
be used in both simple and deep neural networks. Due to high volatility of
training networks with Absolute activation, a special modification of ADAM
training algorithm is used, that estimates lower bound of accuracy at any test
dataset using validation dataset analysis at each training epoch, and uses this
value to stop/decrease learning rate, and re-initializes ADAM algorithm between
these steps. It is shown that solving the MNIST problem with the LeNet-like
architectures based on Absolute activation allows to significantly reduce the
number of trained parameters in the neural network with improving the
prediction accuracy.
Related papers
- ENN: A Neural Network with DCT Adaptive Activation Functions [2.2713084727838115]
We present Expressive Neural Network (ENN), a novel model in which the non-linear activation functions are modeled using the Discrete Cosine Transform (DCT)
This parametrization keeps the number of trainable parameters low, is appropriate for gradient-based schemes, and adapts to different learning tasks.
The performance of ENN outperforms state of the art benchmarks, providing above a 40% gap in accuracy in some scenarios.
arXiv Detail & Related papers (2023-07-02T21:46:30Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting.
We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z) - Consensus Function from an $L_p^q-$norm Regularization Term for its Use
as Adaptive Activation Functions in Neural Networks [0.0]
We propose the definition and utilization of an implicit, parametric, non-linear activation function that adapts its shape during the training process.
This fact increases the space of parameters to optimize within the network, but it allows a greater flexibility and generalizes the concept of neural networks.
Preliminary results show that the use of these neural networks with this type of adaptive activation functions reduces the error in regression and classification examples.
arXiv Detail & Related papers (2022-06-30T04:48:14Z) - Wasserstein Flow Meets Replicator Dynamics: A Mean-Field Analysis of Representation Learning in Actor-Critic [137.04558017227583]
Actor-critic (AC) algorithms, empowered by neural networks, have had significant empirical success in recent years.
We take a mean-field perspective on the evolution and convergence of feature-based neural AC.
We prove that neural AC finds the globally optimal policy at a sublinear rate.
arXiv Detail & Related papers (2021-12-27T06:09:50Z) - Training Certifiably Robust Neural Networks with Efficient Local
Lipschitz Bounds [99.23098204458336]
Certified robustness is a desirable property for deep neural networks in safety-critical applications.
We show that our method consistently outperforms state-of-the-art methods on MNIST and TinyNet datasets.
arXiv Detail & Related papers (2021-11-02T06:44:10Z) - Otimizacao de pesos e funcoes de ativacao de redes neurais aplicadas na
previsao de series temporais [0.0]
We propose the use of a family of free parameter asymmetric activation functions for neural networks.
We show that this family of defined activation functions satisfies the requirements of the universal approximation theorem.
A methodology for the global optimization of this family of activation functions with free parameter and the weights of the connections between the processing units of the neural network is used.
arXiv Detail & Related papers (2021-07-29T23:32:15Z) - Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix.
Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z) - Cooperative Initialization based Deep Neural Network Training [35.14235994478142]
Our approach uses multiple activation functions in the initial few epochs for the update of all sets of weight parameters while training the network.
Our approach outperforms various baselines and, at the same time, performs well over various tasks such as classification and detection.
arXiv Detail & Related papers (2020-01-05T14:08:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.