Empirical Loss Landscape Analysis of Neural Network Activation Functions
- URL: http://arxiv.org/abs/2306.16090v1
- Date: Wed, 28 Jun 2023 10:46:14 GMT
- Title: Empirical Loss Landscape Analysis of Neural Network Activation Functions
- Authors: Anna Sergeevna Bosman, Andries Engelbrecht, Marde Helbig
- Abstract summary: Activation functions play a significant role in neural network design by enabling non-linearity.
This study empirically investigates neural network loss landscapes associated with hyperbolic tangent, rectified linear unit, and exponential linear unit activation functions.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Activation functions play a significant role in neural network design by
enabling non-linearity. The choice of activation function was previously shown
to influence the properties of the resulting loss landscape. Understanding the
relationship between activation functions and loss landscape properties is
important for neural architecture and training algorithm design. This study
empirically investigates neural network loss landscapes associated with
hyperbolic tangent, rectified linear unit, and exponential linear unit
activation functions. Rectified linear unit is shown to yield the most convex
loss landscape, and exponential linear unit is shown to yield the least flat
loss landscape, and to exhibit superior generalisation performance. The
presence of wide and narrow valleys in the loss landscape is established for
all activation functions, and the narrow valleys are shown to correlate with
saturated neurons and implicitly regularised network configurations.
Related papers
- Dynamical loss functions shape landscape topography and improve learning in artificial neural networks [0.9208007322096533]
We show how to transform cross-entropy and mean squared error into dynamical loss functions.
We show how they significantly improve validation accuracy for networks of varying sizes.
arXiv Detail & Related papers (2024-10-14T16:27:03Z) - Physics-Informed Neural Networks: Minimizing Residual Loss with Wide Networks and Effective Activations [5.731640425517324]
We show that under certain conditions, the residual loss of PINNs can be globally minimized by a wide neural network.
An activation function with well-behaved high-order derivatives plays a crucial role in minimizing the residual loss.
The established theory paves the way for designing and choosing effective activation functions for PINNs.
arXiv Detail & Related papers (2024-05-02T19:08:59Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Exploring Linear Feature Disentanglement For Neural Networks [63.20827189693117]
Non-linear activation functions, e.g., Sigmoid, ReLU, and Tanh, have achieved great success in neural networks (NNs)
Due to the complex non-linear characteristic of samples, the objective of those activation functions is to project samples from their original feature space to a linear separable feature space.
This phenomenon ignites our interest in exploring whether all features need to be transformed by all non-linear functions in current typical NNs.
arXiv Detail & Related papers (2022-03-22T13:09:17Z) - Topological obstructions in neural networks learning [67.8848058842671]
We study global properties of the loss gradient function flow.
We use topological data analysis of the loss function and its Morse complex to relate local behavior along gradient trajectories with global properties of the loss surface.
arXiv Detail & Related papers (2020-12-31T18:53:25Z) - Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis.
By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner.
This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z) - Measuring Model Complexity of Neural Networks with Curve Activation
Functions [100.98319505253797]
We propose the linear approximation neural network (LANN) to approximate a given deep model with curve activation function.
We experimentally explore the training process of neural networks and detect overfitting.
We find that the $L1$ and $L2$ regularizations suppress the increase of model complexity.
arXiv Detail & Related papers (2020-06-16T07:38:06Z) - Piecewise linear activations substantially shape the loss surfaces of
neural networks [95.73230376153872]
This paper presents how piecewise linear activation functions substantially shape the loss surfaces of neural networks.
We first prove that it the loss surfaces of many neural networks have infinite spurious local minima which are defined as the local minima with higher empirical risks than the global minima.
For one-hidden-layer networks, we prove that all local minima in a cell constitute an equivalence class; they are concentrated in a valley; and they are all global minima in the cell.
arXiv Detail & Related papers (2020-03-27T04:59:34Z) - Investigating the interaction between gradient-only line searches and
different activation functions [0.0]
Gradient-only line searches (GOLS) adaptively determine step sizes along search directions for discontinuous loss functions in neural network training.
We find that GOLS are robust for a range of activation functions, but sensitive to the Rectified Linear Unit (ReLU) activation function in standard feedforward architectures.
arXiv Detail & Related papers (2020-02-23T12:28:27Z) - Ill-Posedness and Optimization Geometry for Nonlinear Neural Network
Training [4.7210697296108926]
We show that the nonlinear activation functions used in the network construction play a critical role in classifying stationary points of the loss landscape.
For shallow dense networks, the nonlinear activation function determines the Hessian nullspace in the vicinity of global minima.
We extend these results to deep dense neural networks, showing that the last activation function plays an important role in classifying stationary points.
arXiv Detail & Related papers (2020-02-07T16:33:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.