Bayesian optimization for sparse neural networks with trainable
activation functions
- URL: http://arxiv.org/abs/2304.04455v2
- Date: Wed, 19 Apr 2023 12:44:09 GMT
- Title: Bayesian optimization for sparse neural networks with trainable
activation functions
- Authors: Mohamed Fakhfakh and Lotfi Chaari
- Abstract summary: We propose a trainable activation function whose parameters need to be estimated.
A fully Bayesian model is developed to automatically estimate from the learning data both the model weights and activation function parameters.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the literature on deep neural networks, there is considerable interest in
developing activation functions that can enhance neural network performance. In
recent years, there has been renewed scientific interest in proposing
activation functions that can be trained throughout the learning process, as
they appear to improve network performance, especially by reducing overfitting.
In this paper, we propose a trainable activation function whose parameters need
to be estimated. A fully Bayesian model is developed to automatically estimate
from the learning data both the model weights and activation function
parameters. An MCMC-based optimization scheme is developed to build the
inference. The proposed method aims to solve the aforementioned problems and
improve convergence time by using an efficient sampling scheme that guarantees
convergence to the global maximum. The proposed scheme is tested on three
datasets with three different CNNs. Promising results demonstrate the
usefulness of our proposed approach in improving model accuracy due to the
proposed activation function and Bayesian estimation of the parameters.
Related papers
- Adaptive Activation Functions for Predictive Modeling with Sparse
Experimental Data [2.012425476229879]
This study investigates the influence of adaptive or trainable activation functions on classification accuracy and predictive uncertainty in settings characterized by limited data availability.
Our investigation reveals that adaptive activation functions, such as Exponential Linear Unit (ELU) and Softplus, with individual trainable parameters, result in accurate and confident prediction models.
arXiv Detail & Related papers (2024-02-08T04:35:09Z) - ENN: A Neural Network with DCT Adaptive Activation Functions [2.2713084727838115]
We present Expressive Neural Network (ENN), a novel model in which the non-linear activation functions are modeled using the Discrete Cosine Transform (DCT)
This parametrization keeps the number of trainable parameters low, is appropriate for gradient-based schemes, and adapts to different learning tasks.
The performance of ENN outperforms state of the art benchmarks, providing above a 40% gap in accuracy in some scenarios.
arXiv Detail & Related papers (2023-07-02T21:46:30Z) - Improving Neural Additive Models with Bayesian Principles [54.29602161803093]
Neural additive models (NAMs) enhance the transparency of deep neural networks by handling calibrated input features in separate additive sub-networks.
We develop Laplace-approximated NAMs (LA-NAMs) which show improved empirical performance on datasets and challenging real-world medical tasks.
arXiv Detail & Related papers (2023-05-26T13:19:15Z) - Promises and Pitfalls of the Linearized Laplace in Bayesian Optimization [73.80101701431103]
The linearized-Laplace approximation (LLA) has been shown to be effective and efficient in constructing Bayesian neural networks.
We study the usefulness of the LLA in Bayesian optimization and highlight its strong performance and flexibility.
arXiv Detail & Related papers (2023-04-17T14:23:43Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Efficient Activation Function Optimization through Surrogate Modeling [15.219959721479835]
This paper aims to improve the state of the art through three steps.
First, the benchmark Act-Bench-CNN, Act-Bench-ResNet, and Act-Bench-ViT were created by training convolutional, residual, and vision transformer architectures.
Second, a characterization of the benchmark space was developed, leading to a new surrogate-based method for optimization.
arXiv Detail & Related papers (2023-01-13T23:11:14Z) - Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting.
We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z) - Learning to Learn with Generative Models of Neural Network Checkpoints [71.06722933442956]
We construct a dataset of neural network checkpoints and train a generative model on the parameters.
We find that our approach successfully generates parameters for a wide range of loss prompts.
We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.
arXiv Detail & Related papers (2022-09-26T17:59:58Z) - Adaptively Customizing Activation Functions for Various Layers [10.522556291990437]
In this work, a novel methodology is proposed to adaptively customize activation functions only by adding very few parameters to the traditional activation functions like Sigmoid, Tanh, and ReLU.
To verify the effectiveness of the proposed methodology, some theoretical and experimental analysis on accelerating the convergence and improving the performance is presented.
The results show that the proposed methodology is very simple but with significant performance in convergence speed, precision and generalization, and it can surpass other popular methods like ReLU and adaptive functions like Swish in almost all experiments in terms of overall performance.
arXiv Detail & Related papers (2021-12-17T11:23:03Z) - Dynamic Iterative Refinement for Efficient 3D Hand Pose Estimation [87.54604263202941]
We propose a tiny deep neural network of which partial layers are iteratively exploited for refining its previous estimations.
We employ learned gating criteria to decide whether to exit from the weight-sharing loop, allowing per-sample adaptation in our model.
Our method consistently outperforms state-of-the-art 2D/3D hand pose estimation approaches in terms of both accuracy and efficiency for widely used benchmarks.
arXiv Detail & Related papers (2021-11-11T23:31:34Z) - Otimizacao de pesos e funcoes de ativacao de redes neurais aplicadas na
previsao de series temporais [0.0]
We propose the use of a family of free parameter asymmetric activation functions for neural networks.
We show that this family of defined activation functions satisfies the requirements of the universal approximation theorem.
A methodology for the global optimization of this family of activation functions with free parameter and the weights of the connections between the processing units of the neural network is used.
arXiv Detail & Related papers (2021-07-29T23:32:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.