Overcoming Overfitting and Large Weight Update Problem in Linear
Rectifiers: Thresholded Exponential Rectified Linear Units
- URL: http://arxiv.org/abs/2006.02797v1
- Date: Thu, 4 Jun 2020 11:55:47 GMT
- Title: Overcoming Overfitting and Large Weight Update Problem in Linear
Rectifiers: Thresholded Exponential Rectified Linear Units
- Authors: Vijay Pandey
- Abstract summary: "Thresholded exponential rectified linear unit" (TERELU) activation function works better in alleviating in overfitting: large weight update problem.
We will show better performance on the various using neural networks, considering TERELU activation method compared to other activation datasets.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In past few years, linear rectified unit activation functions have shown its
significance in the neural networks, surpassing the performance of sigmoid
activations. RELU (Nair & Hinton, 2010), ELU (Clevert et al., 2015), PRELU (He
et al., 2015), LRELU (Maas et al., 2013), SRELU (Jin et al., 2016),
ThresholdedRELU, all these linear rectified activation functions have its own
significance over others in some aspect. Most of the time these activation
functions suffer from bias shift problem due to non-zero output mean, and high
weight update problem in deep complex networks due to unit gradient, which
results in slower training, and high variance in model prediction respectively.
In this paper, we propose, "Thresholded exponential rectified linear unit"
(TERELU) activation function that works better in alleviating in overfitting:
large weight update problem. Along with alleviating overfitting problem, this
method also gives good amount of non-linearity as compared to other linear
rectifiers. We will show better performance on the various datasets using
neural networks, considering TERELU activation method compared to other
activations.
Related papers
- Activation function optimization method: Learnable series linear units (LSLUs) [12.089173508371246]
We propose a series-based learnable ac-tivation function called LSLU (Learnable Series Linear Units)
This method simplifies deep learning networks while im-proving accuracy.
We evaluate LSLU's performance on CIFAR10, CIFAR100, and specific task datasets (e.g., Silkworm)
arXiv Detail & Related papers (2024-08-28T11:12:27Z) - Linearization of ReLU Activation Function for Neural Network-Embedded
Optimization:Optimal Day-Ahead Energy Scheduling [0.2900810893770134]
In some applications such as battery degradation neural network-based microgrid day-ahead energy scheduling, the input features of the trained learning model are variables to be solved in optimization models.
The use of nonlinear activation functions in the neural network will make such problems extremely hard to solve if not unsolvable.
This paper investigated different methods for linearizing the nonlinear activation functions with a particular focus on the widely used rectified linear unit (ReLU) function.
arXiv Detail & Related papers (2023-10-03T02:47:38Z) - Parametric Leaky Tanh: A New Hybrid Activation Function for Deep
Learning [0.0]
Activation functions (AFs) are crucial components of deep neural networks (DNNs)
We propose a novel hybrid activation function designed to combine the strengths of both the Tanh and Leaky ReLU activation functions.
PLanh is differentiable at all points and addresses the 'dying ReLU' problem by ensuring a non-zero gradient for negative inputs.
arXiv Detail & Related papers (2023-08-11T08:59:27Z) - The Implicit Bias of Minima Stability in Multivariate Shallow ReLU
Networks [53.95175206863992]
We study the type of solutions to which gradient descent converges when used to train a single hidden-layer multivariate ReLU network with the quadratic loss.
We prove that although shallow ReLU networks are universal approximators, stable shallow networks are not.
arXiv Detail & Related papers (2023-06-30T09:17:39Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Theoretical Characterization of the Generalization Performance of
Overfitted Meta-Learning [70.52689048213398]
This paper studies the performance of overfitted meta-learning under a linear regression model with Gaussian features.
We find new and interesting properties that do not exist in single-task linear regression.
Our analysis suggests that benign overfitting is more significant and easier to observe when the noise and the diversity/fluctuation of the ground truth of each training task are large.
arXiv Detail & Related papers (2023-04-09T20:36:13Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Graph-adaptive Rectified Linear Unit for Graph Neural Networks [64.92221119723048]
Graph Neural Networks (GNNs) have achieved remarkable success by extending traditional convolution to learning on non-Euclidean data.
We propose Graph-adaptive Rectified Linear Unit (GReLU) which is a new parametric activation function incorporating the neighborhood information in a novel and efficient way.
We conduct comprehensive experiments to show that our plug-and-play GReLU method is efficient and effective given different GNN backbones and various downstream tasks.
arXiv Detail & Related papers (2022-02-13T10:54:59Z) - LiSHT: Non-Parametric Linearly Scaled Hyperbolic Tangent Activation
Function for Neural Networks [14.943863837083496]
We propose a Linearly Scaled Hyperbolic Tangent (LiSHT) for Neural Networks (NNs) by scaling the Tanh linearly.
We observe the superior performance using Multi-layer Perceptron (MLP), Residual Network (ResNet) and Long-short term memory (LSTM) for data classification, image classification and tweets classification tasks.
arXiv Detail & Related papers (2019-01-01T02:24:06Z) - Gaussian Error Linear Units (GELUs) [58.195342948092964]
We propose a neural network activation function that weights inputs by their value, rather than gates by their sign.
We find performance improvements across all considered computer vision, natural language processing, and speech tasks.
arXiv Detail & Related papers (2016-06-27T19:20:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.