Adaptive Activation Functions for Predictive Modeling with Sparse
Experimental Data
- URL: http://arxiv.org/abs/2402.05401v1
- Date: Thu, 8 Feb 2024 04:35:09 GMT
- Title: Adaptive Activation Functions for Predictive Modeling with Sparse
Experimental Data
- Authors: Farhad Pourkamali-Anaraki, Tahamina Nasrin, Robert E. Jensen, Amy M.
Peterson, Christopher J. Hansen
- Abstract summary: This study investigates the influence of adaptive or trainable activation functions on classification accuracy and predictive uncertainty in settings characterized by limited data availability.
Our investigation reveals that adaptive activation functions, such as Exponential Linear Unit (ELU) and Softplus, with individual trainable parameters, result in accurate and confident prediction models.
- Score: 2.012425476229879
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A pivotal aspect in the design of neural networks lies in selecting
activation functions, crucial for introducing nonlinear structures that capture
intricate input-output patterns. While the effectiveness of adaptive or
trainable activation functions has been studied in domains with ample data,
like image classification problems, significant gaps persist in understanding
their influence on classification accuracy and predictive uncertainty in
settings characterized by limited data availability. This research aims to
address these gaps by investigating the use of two types of adaptive activation
functions. These functions incorporate shared and individual trainable
parameters per hidden layer and are examined in three testbeds derived from
additive manufacturing problems containing fewer than one hundred training
instances. Our investigation reveals that adaptive activation functions, such
as Exponential Linear Unit (ELU) and Softplus, with individual trainable
parameters, result in accurate and confident prediction models that outperform
fixed-shape activation functions and the less flexible method of using
identical trainable activation functions in a hidden layer. Therefore, this
work presents an elegant way of facilitating the design of adaptive neural
networks in scientific and engineering problems.
Related papers
- Parameter-Efficient and Memory-Efficient Tuning for Vision Transformer: A Disentangled Approach [87.8330887605381]
We show how to adapt a pre-trained Vision Transformer to downstream recognition tasks with only a few learnable parameters.
We synthesize a task-specific query with a learnable and lightweight module, which is independent of the pre-trained model.
Our method achieves state-of-the-art performance under memory constraints, showcasing its applicability in real-world situations.
arXiv Detail & Related papers (2024-07-09T15:45:04Z) - ENN: A Neural Network with DCT Adaptive Activation Functions [2.2713084727838115]
We present Expressive Neural Network (ENN), a novel model in which the non-linear activation functions are modeled using the Discrete Cosine Transform (DCT)
This parametrization keeps the number of trainable parameters low, is appropriate for gradient-based schemes, and adapts to different learning tasks.
The performance of ENN outperforms state of the art benchmarks, providing above a 40% gap in accuracy in some scenarios.
arXiv Detail & Related papers (2023-07-02T21:46:30Z) - Promises and Pitfalls of the Linearized Laplace in Bayesian Optimization [73.80101701431103]
The linearized-Laplace approximation (LLA) has been shown to be effective and efficient in constructing Bayesian neural networks.
We study the usefulness of the LLA in Bayesian optimization and highlight its strong performance and flexibility.
arXiv Detail & Related papers (2023-04-17T14:23:43Z) - Bayesian optimization for sparse neural networks with trainable
activation functions [0.0]
We propose a trainable activation function whose parameters need to be estimated.
A fully Bayesian model is developed to automatically estimate from the learning data both the model weights and activation function parameters.
arXiv Detail & Related papers (2023-04-10T08:44:44Z) - Efficient Activation Function Optimization through Surrogate Modeling [15.219959721479835]
This paper aims to improve the state of the art through three steps.
First, the benchmark Act-Bench-CNN, Act-Bench-ResNet, and Act-Bench-ViT were created by training convolutional, residual, and vision transformer architectures.
Second, a characterization of the benchmark space was developed, leading to a new surrogate-based method for optimization.
arXiv Detail & Related papers (2023-01-13T23:11:14Z) - Stochastic Adaptive Activation Function [1.9199289015460212]
This study proposes a simple yet effective activation function that facilitates different thresholds and adaptive activations according to the positions of units and the contexts of inputs.
Experimental analysis demonstrates that our activation function can provide the benefits of more accurate prediction and earlier convergence in many deep learning applications.
arXiv Detail & Related papers (2022-10-21T01:57:25Z) - Transformers with Learnable Activation Functions [63.98696070245065]
We use Rational Activation Function (RAF) to learn optimal activation functions during training according to input data.
RAF opens a new research direction for analyzing and interpreting pre-trained models according to the learned activation functions.
arXiv Detail & Related papers (2022-08-30T09:47:31Z) - Squashing activation functions in benchmark tests: towards eXplainable
Artificial Intelligence using continuous-valued logic [0.0]
This work demonstrates the first benchmark tests that measure the performance of Squashing functions in neural networks.
Three experiments were carried out to examine their usability and a comparison with the most popular activation functions was made for five different network types.
Results indicate that due to the embedded nilpotent logical operators and the differentiability of the Squashing function, it is possible to solve classification problems.
arXiv Detail & Related papers (2020-10-17T10:42:40Z) - Estimating Structural Target Functions using Machine Learning and
Influence Functions [103.47897241856603]
We propose a new framework for statistical machine learning of target functions arising as identifiable functionals from statistical models.
This framework is problem- and model-agnostic and can be used to estimate a broad variety of target parameters of interest in applied statistics.
We put particular focus on so-called coarsening at random/doubly robust problems with partially unobserved information.
arXiv Detail & Related papers (2020-08-14T16:48:29Z) - Influence Functions in Deep Learning Are Fragile [52.31375893260445]
influence functions approximate the effect of samples in test-time predictions.
influence estimates are fairly accurate for shallow networks.
Hessian regularization is important to get highquality influence estimates.
arXiv Detail & Related papers (2020-06-25T18:25:59Z) - Towards Efficient Processing and Learning with Spikes: New Approaches
for Multi-Spike Learning [59.249322621035056]
We propose two new multi-spike learning rules which demonstrate better performance over other baselines on various tasks.
In the feature detection task, we re-examine the ability of unsupervised STDP with its limitations being presented.
Our proposed learning rules can reliably solve the task over a wide range of conditions without specific constraints being applied.
arXiv Detail & Related papers (2020-05-02T06:41:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.