Related papers: Data-aware customization of activation functions reduces neural network error

Data-aware customization of activation functions reduces neural network error

URL: http://arxiv.org/abs/2301.06635v1
Date: Mon, 16 Jan 2023 23:38:37 GMT
Title: Data-aware customization of activation functions reduces neural network error
Authors: Fuchang Gao, Boyu Zhang
Abstract summary: We show that data-aware customization of activation functions can result in striking reductions in neural network error. A simple substitution with the seagull'' activation function in an already-refined neural network can lead to an order-of-magnitude reduction in error.
Score: 0.35172332086962865
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Activation functions play critical roles in neural networks, yet current off-the-shelf neural networks pay little attention to the specific choice of activation functions used. Here we show that data-aware customization of activation functions can result in striking reductions in neural network error. We first give a simple linear algebraic explanation of the role of activation functions in neural networks; then, through connection with the Diaconis-Shahshahani Approximation Theorem, we propose a set of criteria for good activation functions. As a case study, we consider regression tasks with a partially exchangeable target function, \emph{i.e.} $f(u,v,w)=f(v,u,w)$ for $u,v\in \mathbb{R}^d$ and $w\in \mathbb{R}^k$, and prove that for such a target function, using an even activation function in at least one of the layers guarantees that the prediction preserves partial exchangeability for best performance. Since even activation functions are seldom used in practice, we designed the ``seagull'' even activation function $\log(1+x^2)$ according to our criteria. Empirical testing on over two dozen 9-25 dimensional examples with different local smoothness, curvature, and degree of exchangeability revealed that a simple substitution with the ``seagull'' activation function in an already-refined neural network can lead to an order-of-magnitude reduction in error. This improvement was most pronounced when the activation function substitution was applied to the layer in which the exchangeable variables are connected for the first time. While the improvement is greatest for low-dimensional data, experiments on the CIFAR10 image classification dataset showed that use of ``seagull'' can reduce error even for high-dimensional cases. These results collectively highlight the potential of customizing activation functions as a general approach to improve neural network performance.

Related papers

TSSR: A Truncated and Signed Square Root Activation Function for Neural Networks [5.9622541907827875]
We introduce a new activation function called the Truncated and Signed Square Root (TSSR) function. This function is distinctive because it is odd, nonlinear, monotone and differentiable. It has the potential to improve the numerical stability of neural networks.
arXiv Detail & Related papers (2023-08-09T09:40:34Z)
STL: A Signed and Truncated Logarithm Activation Function for Neural Networks [5.9622541907827875]
Activation functions play an essential role in neural networks. We present a novel signed and truncated logarithm function as activation function. The suggested activation function can be applied in a large range of neural networks.
arXiv Detail & Related papers (2023-07-31T03:41:14Z)
Promises and Pitfalls of the Linearized Laplace in Bayesian Optimization [73.80101701431103]
The linearized-Laplace approximation (LLA) has been shown to be effective and efficient in constructing Bayesian neural networks. We study the usefulness of the LLA in Bayesian optimization and highlight its strong performance and flexibility.
arXiv Detail & Related papers (2023-04-17T14:23:43Z)
Globally Optimal Training of Neural Networks with Threshold Activation Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations. We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z)
Neural Estimation of Submodular Functions with Applications to Differentiable Subset Selection [50.14730810124592]
Submodular functions and variants, through their ability to characterize diversity and coverage, have emerged as a key tool for data selection and summarization. We propose FLEXSUBNET, a family of flexible neural models for both monotone and non-monotone submodular functions.
arXiv Detail & Related papers (2022-10-20T06:00:45Z)
Transformers with Learnable Activation Functions [63.98696070245065]
We use Rational Activation Function (RAF) to learn optimal activation functions during training according to input data. RAF opens a new research direction for analyzing and interpreting pre-trained models according to the learned activation functions.
arXiv Detail & Related papers (2022-08-30T09:47:31Z)
Consensus Function from an $L_p^q-$norm Regularization Term for its Use as Adaptive Activation Functions in Neural Networks [0.0]
We propose the definition and utilization of an implicit, parametric, non-linear activation function that adapts its shape during the training process. This fact increases the space of parameters to optimize within the network, but it allows a greater flexibility and generalizes the concept of neural networks. Preliminary results show that the use of these neural networks with this type of adaptive activation functions reduces the error in regression and classification examples.
arXiv Detail & Related papers (2022-06-30T04:48:14Z)
Learning Bayesian Sparse Networks with Full Experience Replay for Continual Learning [54.7584721943286]
Continual Learning (CL) methods aim to enable machine learning models to learn new tasks without catastrophic forgetting of those that have been previously mastered. Existing CL approaches often keep a buffer of previously-seen samples, perform knowledge distillation, or use regularization techniques towards this goal. We propose to only activate and select sparse neurons for learning current and past tasks at any stage.
arXiv Detail & Related papers (2022-02-21T13:25:03Z)
Graph-adaptive Rectified Linear Unit for Graph Neural Networks [64.92221119723048]
Graph Neural Networks (GNNs) have achieved remarkable success by extending traditional convolution to learning on non-Euclidean data. We propose Graph-adaptive Rectified Linear Unit (GReLU) which is a new parametric activation function incorporating the neighborhood information in a novel and efficient way. We conduct comprehensive experiments to show that our plug-and-play GReLU method is efficient and effective given different GNN backbones and various downstream tasks.
arXiv Detail & Related papers (2022-02-13T10:54:59Z)
Growing Cosine Unit: A Novel Oscillatory Activation Function That Can Speedup Training and Reduce Parameters in Convolutional Neural Networks [0.1529342790344802]
Convolution neural networks have been successful in solving many socially important and economically significant problems. Key discovery that made training deep networks feasible was the adoption of the Rectified Linear Unit (ReLU) activation function. New activation function C(z) = z cos z outperforms Sigmoids, Swish, Mish and ReLU on a variety of architectures.
arXiv Detail & Related papers (2021-08-30T01:07:05Z)
A Use of Even Activation Functions in Neural Networks [0.35172332086962865]
We propose an alternative approach to integrate existing knowledge or hypotheses of data structure by constructing custom activation functions. We show that using an even activation function in one of the fully connected layers improves neural network performance.
arXiv Detail & Related papers (2020-11-23T20:33:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.