Related papers: Overcoming Overfitting and Large Weight Update Problem in Linear Rectifiers: Thresholded Exponential Rectified Linear Units

Overcoming Overfitting and Large Weight Update Problem in Linear Rectifiers: Thresholded Exponential Rectified Linear Units

URL: http://arxiv.org/abs/2006.02797v1
Date: Thu, 4 Jun 2020 11:55:47 GMT
Title: Overcoming Overfitting and Large Weight Update Problem in Linear Rectifiers: Thresholded Exponential Rectified Linear Units
Authors: Vijay Pandey
Abstract summary: "Thresholded exponential rectified linear unit" (TERELU) activation function works better in alleviating in overfitting: large weight update problem. We will show better performance on the various using neural networks, considering TERELU activation method compared to other activation datasets.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In past few years, linear rectified unit activation functions have shown its significance in the neural networks, surpassing the performance of sigmoid activations. RELU (Nair & Hinton, 2010), ELU (Clevert et al., 2015), PRELU (He et al., 2015), LRELU (Maas et al., 2013), SRELU (Jin et al., 2016), ThresholdedRELU, all these linear rectified activation functions have its own significance over others in some aspect. Most of the time these activation functions suffer from bias shift problem due to non-zero output mean, and high weight update problem in deep complex networks due to unit gradient, which results in slower training, and high variance in model prediction respectively. In this paper, we propose, "Thresholded exponential rectified linear unit" (TERELU) activation function that works better in alleviating in overfitting: large weight update problem. Along with alleviating overfitting problem, this method also gives good amount of non-linearity as compared to other linear rectifiers. We will show better performance on the various datasets using neural networks, considering TERELU activation method compared to other activations.

Related papers

Robust Deep Network Learning of Nonlinear Regression Tasks by Parametric Leaky Exponential Linear Units (LELUs) and a Diffusion Metric [0.0]
nonlinear ac.f.'s are required for learning nonlinear datasets.<n>Smooth but vanishing-gradient ac.f.'s such as ELU or SiLU have limited performance.<n>Non-smooth ac.f.'s such as RELU and Leaky-RELU impart discontinuity in the trained model.
arXiv Detail & Related papers (2025-07-09T11:49:15Z)
Gompertz Linear Units: Leveraging Asymmetry for Enhanced Learning Dynamics [39.0860823332923]
GoLU is a novel self-gated activation function defined as $mathrmGoLU(x) = x, mathrmGompertz(x)$, wheremathrmGompertz(x) = e-e-x$. GoLU's superior performance relative to state-of-the-art activation functions, highlights GoLU as a robust alternative to existing activation functions.
arXiv Detail & Related papers (2025-02-05T22:32:22Z)
Activation function optimization method: Learnable series linear units (LSLUs) [12.089173508371246]
We propose a series-based learnable ac-tivation function called LSLU (Learnable Series Linear Units) This method simplifies deep learning networks while im-proving accuracy. We evaluate LSLU's performance on CIFAR10, CIFAR100, and specific task datasets (e.g., Silkworm)
arXiv Detail & Related papers (2024-08-28T11:12:27Z)
Linearization of ReLU Activation Function for Neural Network-Embedded Optimization:Optimal Day-Ahead Energy Scheduling [0.2900810893770134]
In some applications such as battery degradation neural network-based microgrid day-ahead energy scheduling, the input features of the trained learning model are variables to be solved in optimization models. The use of nonlinear activation functions in the neural network will make such problems extremely hard to solve if not unsolvable. This paper investigated different methods for linearizing the nonlinear activation functions with a particular focus on the widely used rectified linear unit (ReLU) function.
arXiv Detail & Related papers (2023-10-03T02:47:38Z)
Parametric Leaky Tanh: A New Hybrid Activation Function for Deep Learning [0.0]
Activation functions (AFs) are crucial components of deep neural networks (DNNs) We propose a novel hybrid activation function designed to combine the strengths of both the Tanh and Leaky ReLU activation functions. PLanh is differentiable at all points and addresses the 'dying ReLU' problem by ensuring a non-zero gradient for negative inputs.
arXiv Detail & Related papers (2023-08-11T08:59:27Z)
The Implicit Bias of Minima Stability in Multivariate Shallow ReLU Networks [53.95175206863992]
We study the type of solutions to which gradient descent converges when used to train a single hidden-layer multivariate ReLU network with the quadratic loss. We prove that although shallow ReLU networks are universal approximators, stable shallow networks are not.
arXiv Detail & Related papers (2023-06-30T09:17:39Z)
Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification. Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z)
Theoretical Characterization of the Generalization Performance of Overfitted Meta-Learning [70.52689048213398]
This paper studies the performance of overfitted meta-learning under a linear regression model with Gaussian features. We find new and interesting properties that do not exist in single-task linear regression. Our analysis suggests that benign overfitting is more significant and easier to observe when the noise and the diversity/fluctuation of the ground truth of each training task are large.
arXiv Detail & Related papers (2023-04-09T20:36:13Z)
Globally Optimal Training of Neural Networks with Threshold Activation Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations. We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z)
Graph-adaptive Rectified Linear Unit for Graph Neural Networks [64.92221119723048]
Graph Neural Networks (GNNs) have achieved remarkable success by extending traditional convolution to learning on non-Euclidean data. We propose Graph-adaptive Rectified Linear Unit (GReLU) which is a new parametric activation function incorporating the neighborhood information in a novel and efficient way. We conduct comprehensive experiments to show that our plug-and-play GReLU method is efficient and effective given different GNN backbones and various downstream tasks.
arXiv Detail & Related papers (2022-02-13T10:54:59Z)
LiSHT: Non-Parametric Linearly Scaled Hyperbolic Tangent Activation Function for Neural Networks [14.943863837083496]
We propose a Linearly Scaled Hyperbolic Tangent (LiSHT) for Neural Networks (NNs) by scaling the Tanh linearly. We observe the superior performance using Multi-layer Perceptron (MLP), Residual Network (ResNet) and Long-short term memory (LSTM) for data classification, image classification and tweets classification tasks.
arXiv Detail & Related papers (2019-01-01T02:24:06Z)
Gaussian Error Linear Units (GELUs) [58.195342948092964]
We propose a neural network activation function that weights inputs by their value, rather than gates by their sign. We find performance improvements across all considered computer vision, natural language processing, and speech tasks.
arXiv Detail & Related papers (2016-06-27T19:20:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.