Investigating the interaction between gradient-only line searches and
different activation functions
- URL: http://arxiv.org/abs/2002.09889v1
- Date: Sun, 23 Feb 2020 12:28:27 GMT
- Title: Investigating the interaction between gradient-only line searches and
different activation functions
- Authors: D. Kafka and Daniel. N. Wilke
- Abstract summary: Gradient-only line searches (GOLS) adaptively determine step sizes along search directions for discontinuous loss functions in neural network training.
We find that GOLS are robust for a range of activation functions, but sensitive to the Rectified Linear Unit (ReLU) activation function in standard feedforward architectures.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Gradient-only line searches (GOLS) adaptively determine step sizes along
search directions for discontinuous loss functions resulting from dynamic
mini-batch sub-sampling in neural network training. Step sizes in GOLS are
determined by localizing Stochastic Non-Negative Associated Gradient Projection
Points (SNN-GPPs) along descent directions. These are identified by a sign
change in the directional derivative from negative to positive along a descent
direction. Activation functions are a significant component of neural network
architectures as they introduce non-linearities essential for complex function
approximations. The smoothness and continuity characteristics of the activation
functions directly affect the gradient characteristics of the loss function to
be optimized. Therefore, it is of interest to investigate the relationship
between activation functions and different neural network architectures in the
context of GOLS. We find that GOLS are robust for a range of activation
functions, but sensitive to the Rectified Linear Unit (ReLU) activation
function in standard feedforward architectures. The zero-derivative in ReLU's
negative input domain can lead to the gradient-vector becoming sparse, which
severely affects training. We show that implementing architectural features
such as batch normalization and skip connections can alleviate these
difficulties and benefit training with GOLS for all activation functions
considered.
Related papers
- Deriving Activation Functions via Integration [0.0]
Activation functions play a crucial role in introducing non-linearities to deep neural networks.
We propose a novel approach to designing activation functions by focusing on their gradients and deriving the corresponding functions through integration.
Our work introduces the Integral of the Exponential Linear Unit (xIELU), a trainable piecewise activation function derived by integrating trainable affine transformations applied on the ELU activation function.
arXiv Detail & Related papers (2024-11-20T03:24:21Z) - Your Network May Need to Be Rewritten: Network Adversarial Based on High-Dimensional Function Graph Decomposition [0.994853090657971]
We propose a network adversarial method to address the aforementioned challenges.
This is the first method to use different activation functions in a network.
We have achieved a substantial improvement over standard activation functions regarding both training efficiency and predictive accuracy.
arXiv Detail & Related papers (2024-05-04T11:22:30Z) - Layer-wise Feedback Propagation [53.00944147633484]
We present Layer-wise Feedback Propagation (LFP), a novel training approach for neural-network-like predictors.
LFP assigns rewards to individual connections based on their respective contributions to solving a given task.
We demonstrate its effectiveness in achieving comparable performance to gradient descent on various models and datasets.
arXiv Detail & Related papers (2023-08-23T10:48:28Z) - Parametric Leaky Tanh: A New Hybrid Activation Function for Deep
Learning [0.0]
Activation functions (AFs) are crucial components of deep neural networks (DNNs)
We propose a novel hybrid activation function designed to combine the strengths of both the Tanh and Leaky ReLU activation functions.
PLanh is differentiable at all points and addresses the 'dying ReLU' problem by ensuring a non-zero gradient for negative inputs.
arXiv Detail & Related papers (2023-08-11T08:59:27Z) - Empirical Loss Landscape Analysis of Neural Network Activation Functions [0.0]
Activation functions play a significant role in neural network design by enabling non-linearity.
This study empirically investigates neural network loss landscapes associated with hyperbolic tangent, rectified linear unit, and exponential linear unit activation functions.
arXiv Detail & Related papers (2023-06-28T10:46:14Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Adaptive Self-supervision Algorithms for Physics-informed Neural
Networks [59.822151945132525]
Physics-informed neural networks (PINNs) incorporate physical knowledge from the problem domain as a soft constraint on the loss function.
We study the impact of the location of the collocation points on the trainability of these models.
We propose a novel adaptive collocation scheme which progressively allocates more collocation points to areas where the model is making higher errors.
arXiv Detail & Related papers (2022-07-08T18:17:06Z) - Exploring Linear Feature Disentanglement For Neural Networks [63.20827189693117]
Non-linear activation functions, e.g., Sigmoid, ReLU, and Tanh, have achieved great success in neural networks (NNs)
Due to the complex non-linear characteristic of samples, the objective of those activation functions is to project samples from their original feature space to a linear separable feature space.
This phenomenon ignites our interest in exploring whether all features need to be transformed by all non-linear functions in current typical NNs.
arXiv Detail & Related papers (2022-03-22T13:09:17Z) - Graph-adaptive Rectified Linear Unit for Graph Neural Networks [64.92221119723048]
Graph Neural Networks (GNNs) have achieved remarkable success by extending traditional convolution to learning on non-Euclidean data.
We propose Graph-adaptive Rectified Linear Unit (GReLU) which is a new parametric activation function incorporating the neighborhood information in a novel and efficient way.
We conduct comprehensive experiments to show that our plug-and-play GReLU method is efficient and effective given different GNN backbones and various downstream tasks.
arXiv Detail & Related papers (2022-02-13T10:54:59Z) - Growing Cosine Unit: A Novel Oscillatory Activation Function That Can
Speedup Training and Reduce Parameters in Convolutional Neural Networks [0.1529342790344802]
Convolution neural networks have been successful in solving many socially important and economically significant problems.
Key discovery that made training deep networks feasible was the adoption of the Rectified Linear Unit (ReLU) activation function.
New activation function C(z) = z cos z outperforms Sigmoids, Swish, Mish and ReLU on a variety of architectures.
arXiv Detail & Related papers (2021-08-30T01:07:05Z) - Topological obstructions in neural networks learning [67.8848058842671]
We study global properties of the loss gradient function flow.
We use topological data analysis of the loss function and its Morse complex to relate local behavior along gradient trajectories with global properties of the loss surface.
arXiv Detail & Related papers (2020-12-31T18:53:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.