A Use of Even Activation Functions in Neural Networks
- URL: http://arxiv.org/abs/2011.11713v1
- Date: Mon, 23 Nov 2020 20:33:13 GMT
- Title: A Use of Even Activation Functions in Neural Networks
- Authors: Fuchang Gao and Boyu Zhang
- Abstract summary: We propose an alternative approach to integrate existing knowledge or hypotheses of data structure by constructing custom activation functions.
We show that using an even activation function in one of the fully connected layers improves neural network performance.
- Score: 0.35172332086962865
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite broad interest in applying deep learning techniques to scientific
discovery, learning interpretable formulas that accurately describe scientific
data is very challenging because of the vast landscape of possible functions
and the "black box" nature of deep neural networks. The key to success is to
effectively integrate existing knowledge or hypotheses about the underlying
structure of the data into the architecture of deep learning models to guide
machine learning. Currently, such integration is commonly done through
customization of the loss functions. Here we propose an alternative approach to
integrate existing knowledge or hypotheses of data structure by constructing
custom activation functions that reflect this structure. Specifically, we study
a common case when the multivariate target function $f$ to be learned from the
data is partially exchangeable, \emph{i.e.} $f(u,v,w)=f(v,u,w)$ for $u,v\in
\mathbb{R}^d$. For instance, these conditions are satisfied for the
classification of images that is invariant under left-right flipping. Through
theoretical proof and experimental verification, we show that using an even
activation function in one of the fully connected layers improves neural
network performance. In our experimental 9-dimensional regression problems,
replacing one of the non-symmetric activation functions with the designated
"Seagull" activation function $\log(1+x^2)$ results in substantial improvement
in network performance. Surprisingly, even activation functions are seldom used
in neural networks. Our results suggest that customized activation functions
have great potential in neural networks.
Related papers
- Do deep neural networks have an inbuilt Occam's razor? [1.1470070927586016]
We show that structured data combined with an intrinsic Occam's razor-like inductive bias towards simple functions counteracts the exponential growth of functions with complexity.
This analysis reveals that structured data, combined with an intrinsic Occam's razor-like inductive bias towards (Kolmogorov) simple functions that is strong enough to counteract the exponential growth of functions with complexity, is a key to the success of DNNs.
arXiv Detail & Related papers (2023-04-13T16:58:21Z) - Provable Data Subset Selection For Efficient Neural Network Training [73.34254513162898]
We introduce the first algorithm to construct coresets for emphRBFNNs, i.e., small weighted subsets that approximate the loss of the input data on any radial basis function network.
We then perform empirical evaluations on function approximation and dataset subset selection on popular network architectures and data sets.
arXiv Detail & Related papers (2023-03-09T10:08:34Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Data-aware customization of activation functions reduces neural network
error [0.35172332086962865]
We show that data-aware customization of activation functions can result in striking reductions in neural network error.
A simple substitution with the seagull'' activation function in an already-refined neural network can lead to an order-of-magnitude reduction in error.
arXiv Detail & Related papers (2023-01-16T23:38:37Z) - Equivariance with Learned Canonicalization Functions [77.32483958400282]
We show that learning a small neural network to perform canonicalization is better than using predefineds.
Our experiments show that learning the canonicalization function is competitive with existing techniques for learning equivariant functions across many tasks.
arXiv Detail & Related papers (2022-11-11T21:58:15Z) - What Can Be Learnt With Wide Convolutional Neural Networks? [69.55323565255631]
We study infinitely-wide deep CNNs in the kernel regime.
We prove that deep CNNs adapt to the spatial scale of the target function.
We conclude by computing the generalisation error of a deep CNN trained on the output of another deep CNN.
arXiv Detail & Related papers (2022-08-01T17:19:32Z) - Consensus Function from an $L_p^q-$norm Regularization Term for its Use
as Adaptive Activation Functions in Neural Networks [0.0]
We propose the definition and utilization of an implicit, parametric, non-linear activation function that adapts its shape during the training process.
This fact increases the space of parameters to optimize within the network, but it allows a greater flexibility and generalizes the concept of neural networks.
Preliminary results show that the use of these neural networks with this type of adaptive activation functions reduces the error in regression and classification examples.
arXiv Detail & Related papers (2022-06-30T04:48:14Z) - Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules.
inputs to the model are routed through a sequence of functions in a way that is end-to-end learned.
We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z) - Deep Learning for Functional Data Analysis with Adaptive Basis Layers [11.831982475316641]
We introduce neural networks that employ a new Basis Layer whose hidden units are each basis functions themselves implemented as a micro neural network.
Our architecture learns to apply parsimonious dimension reduction to functional inputs that focuses only on information relevant to the target rather than irrelevant variation in the input function.
arXiv Detail & Related papers (2021-06-19T04:05:13Z) - The Connection Between Approximation, Depth Separation and Learnability
in Neural Networks [70.55686685872008]
We study the connection between learnability and approximation capacity.
We show that learnability with deep networks of a target function depends on the ability of simpler classes to approximate the target.
arXiv Detail & Related papers (2021-01-31T11:32:30Z) - Estimating Multiplicative Relations in Neural Networks [0.0]
We will use properties of logarithmic functions to propose a pair of activation functions which can translate products into linear expression and learn using backpropagation.
We will try to generalize this approach for some complex arithmetic functions and test the accuracy on a disjoint distribution with the training set.
arXiv Detail & Related papers (2020-10-28T14:28:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.