Approximating Activation Functions
- URL: http://arxiv.org/abs/2001.06370v1
- Date: Fri, 17 Jan 2020 15:25:44 GMT
- Title: Approximating Activation Functions
- Authors: Nicholas Gerard Timmons, Andrew Rice
- Abstract summary: We use function approximation techniques to develop replacements for hyperbolic tangent and sigmoid functions.
We find safe approximations that yield a 10% to 37% improvement in training times on the CPU.
Our functions also match or considerably out perform the ad-hoc approximations used in Theano and the implementation of Word2Vec.
- Score: 3.8834605840347667
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: ReLU is widely seen as the default choice for activation functions in neural
networks. However, there are cases where more complicated functions are
required. In particular, recurrent neural networks (such as LSTMs) make
extensive use of both hyperbolic tangent and sigmoid functions. These functions
are expensive to compute. We used function approximation techniques to develop
replacements for these functions and evaluated them empirically on three
popular network configurations. We find safe approximations that yield a 10% to
37% improvement in training times on the CPU. These approximations were
suitable for all cases we considered and we believe are appropriate
replacements for all networks using these activation functions. We also develop
ranged approximations which only apply in some cases due to restrictions on
their input domain. Our ranged approximations yield a performance improvement
of 20% to 53% in network training time. Our functions also match or
considerably out perform the ad-hoc approximations used in Theano and the
implementation of Word2Vec.
Related papers
- OPAF: Optimized Secure Two-Party Computation Protocols for Nonlinear Activation Functions in Recurrent Neural Network [8.825150825838769]
This paper pays special attention to the implementation of non-linear functions in semi-honest model with two-party settings.
We propose a novel and efficient protocol for exponential function by using a divide-and-conquer strategy.
Next, we take advantage of the symmetry of sigmoid and Tanh, and fine-tune the inputs to reduce the 2PC building blocks.
arXiv Detail & Related papers (2024-03-01T02:49:40Z) - Promises and Pitfalls of the Linearized Laplace in Bayesian Optimization [73.80101701431103]
The linearized-Laplace approximation (LLA) has been shown to be effective and efficient in constructing Bayesian neural networks.
We study the usefulness of the LLA in Bayesian optimization and highlight its strong performance and flexibility.
arXiv Detail & Related papers (2023-04-17T14:23:43Z) - Benefits of Overparameterized Convolutional Residual Networks: Function
Approximation under Smoothness Constraint [48.25573695787407]
We prove that large ConvResNets can not only approximate a target function in terms of function value, but also exhibit sufficient first-order smoothness.
Our theory partially justifies the benefits of using deep and wide networks in practice.
arXiv Detail & Related papers (2022-06-09T15:35:22Z) - Otimizacao de pesos e funcoes de ativacao de redes neurais aplicadas na
previsao de series temporais [0.0]
We propose the use of a family of free parameter asymmetric activation functions for neural networks.
We show that this family of defined activation functions satisfies the requirements of the universal approximation theorem.
A methodology for the global optimization of this family of activation functions with free parameter and the weights of the connections between the processing units of the neural network is used.
arXiv Detail & Related papers (2021-07-29T23:32:15Z) - Going Beyond Linear RL: Sample Efficient Neural Function Approximation [76.57464214864756]
We study function approximation with two-layer neural networks.
Our results significantly improve upon what can be attained with linear (or eluder dimension) methods.
arXiv Detail & Related papers (2021-07-14T03:03:56Z) - Learning Frequency Domain Approximation for Binary Neural Networks [68.79904499480025]
We propose to estimate the gradient of sign function in the Fourier frequency domain using the combination of sine functions for training BNNs.
The experiments on several benchmark datasets and neural architectures illustrate that the binary network learned using our method achieves the state-of-the-art accuracy.
arXiv Detail & Related papers (2021-03-01T08:25:26Z) - S++: A Fast and Deployable Secure-Computation Framework for
Privacy-Preserving Neural Network Training [0.4893345190925178]
We introduce S++, a simple, robust, and deployable framework for training a neural network (NN) using private data from multiple sources.
For the first time, we provide fast and verifiable protocols for all common activation functions and optimize them for running in a secret-shared manner.
arXiv Detail & Related papers (2021-01-28T15:48:54Z) - On Function Approximation in Reinforcement Learning: Optimism in the
Face of Large State Spaces [208.67848059021915]
We study the exploration-exploitation tradeoff at the core of reinforcement learning.
In particular, we prove that the complexity of the function class $mathcalF$ characterizes the complexity of the function.
Our regret bounds are independent of the number of episodes.
arXiv Detail & Related papers (2020-11-09T18:32:22Z) - Review and Comparison of Commonly Used Activation Functions for Deep
Neural Networks [0.0]
It is critical to choose the most appropriate activation function in neural networks calculation.
This research paper will evaluate the commonly used additive functions, such as swish, ReLU, Sigmoid, and so forth.
arXiv Detail & Related papers (2020-10-15T11:09:34Z) - UNIPoint: Universally Approximating Point Processes Intensities [125.08205865536577]
We provide a proof that a class of learnable functions can universally approximate any valid intensity function.
We implement UNIPoint, a novel neural point process model, using recurrent neural networks to parameterise sums of basis function upon each event.
arXiv Detail & Related papers (2020-07-28T09:31:56Z) - Activation functions are not needed: the ratio net [3.9636371287541086]
This paper focus on designing a new function approximator.
Instead of designing new activation functions or kernel functions, the new proposed network uses the fractional form.
It shows that, in most cases, the ratio net converges faster and outperforms both the classification and the RBF.
arXiv Detail & Related papers (2020-05-14T01:07:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.