Neural networks with trainable matrix activation functions
- URL: http://arxiv.org/abs/2109.09948v5
- Date: Mon, 28 Oct 2024 05:40:19 GMT
- Title: Neural networks with trainable matrix activation functions
- Authors: Zhengqi Liu, Shuhao Cao, Yuwen Li, Ludmil Zikatanov,
- Abstract summary: This work develops a systematic approach to constructing matrix-valued activation functions.
The proposed activation functions depend on parameters that are trained along with the weights and bias vectors.
- Score: 7.999703756441757
- License:
- Abstract: The training process of neural networks usually optimize weights and bias parameters of linear transformations, while nonlinear activation functions are pre-specified and fixed. This work develops a systematic approach to constructing matrix-valued activation functions whose entries are generalized from ReLU. The activation is based on matrix-vector multiplications using only scalar multiplications and comparisons. The proposed activation functions depend on parameters that are trained along with the weights and bias vectors. Neural networks based on this approach are simple and efficient and are shown to be robust in numerical experiments.
Related papers
- Promises and Pitfalls of the Linearized Laplace in Bayesian Optimization [73.80101701431103]
The linearized-Laplace approximation (LLA) has been shown to be effective and efficient in constructing Bayesian neural networks.
We study the usefulness of the LLA in Bayesian optimization and highlight its strong performance and flexibility.
arXiv Detail & Related papers (2023-04-17T14:23:43Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Neural Network Verification as Piecewise Linear Optimization:
Formulations for the Composition of Staircase Functions [2.088583843514496]
We present a technique for neural network verification using mixed-integer programming (MIP) formulations.
We derive a strong formulation for each neuron in a network using piecewise linear activation functions.
We also derive a separation procedure that runs in super-linear time in the input dimension.
arXiv Detail & Related papers (2022-11-27T03:25:48Z) - A Tutorial on Neural Networks and Gradient-free Training [0.0]
This paper presents a compact, matrix-based representation of neural networks in a self-contained tutorial fashion.
neural networks are mathematical nonlinear functions constructed by composing several vector-valued functions.
arXiv Detail & Related papers (2022-11-26T15:33:11Z) - Simple initialization and parametrization of sinusoidal networks via
their kernel bandwidth [92.25666446274188]
sinusoidal neural networks with activations have been proposed as an alternative to networks with traditional activation functions.
We first propose a simplified version of such sinusoidal neural networks, which allows both for easier practical implementation and simpler theoretical analysis.
We then analyze the behavior of these networks from the neural tangent kernel perspective and demonstrate that their kernel approximates a low-pass filter with an adjustable bandwidth.
arXiv Detail & Related papers (2022-11-26T07:41:48Z) - Consensus Function from an $L_p^q-$norm Regularization Term for its Use
as Adaptive Activation Functions in Neural Networks [0.0]
We propose the definition and utilization of an implicit, parametric, non-linear activation function that adapts its shape during the training process.
This fact increases the space of parameters to optimize within the network, but it allows a greater flexibility and generalizes the concept of neural networks.
Preliminary results show that the use of these neural networks with this type of adaptive activation functions reduces the error in regression and classification examples.
arXiv Detail & Related papers (2022-06-30T04:48:14Z) - Memory-Efficient Backpropagation through Large Linear Layers [107.20037639738433]
In modern neural networks like Transformers, linear layers require significant memory to store activations during backward pass.
This study proposes a memory reduction approach to perform backpropagation through linear layers.
arXiv Detail & Related papers (2022-01-31T13:02:41Z) - Otimizacao de pesos e funcoes de ativacao de redes neurais aplicadas na
previsao de series temporais [0.0]
We propose the use of a family of free parameter asymmetric activation functions for neural networks.
We show that this family of defined activation functions satisfies the requirements of the universal approximation theorem.
A methodology for the global optimization of this family of activation functions with free parameter and the weights of the connections between the processing units of the neural network is used.
arXiv Detail & Related papers (2021-07-29T23:32:15Z) - Going Beyond Linear RL: Sample Efficient Neural Function Approximation [76.57464214864756]
We study function approximation with two-layer neural networks.
Our results significantly improve upon what can be attained with linear (or eluder dimension) methods.
arXiv Detail & Related papers (2021-07-14T03:03:56Z) - Estimating Multiplicative Relations in Neural Networks [0.0]
We will use properties of logarithmic functions to propose a pair of activation functions which can translate products into linear expression and learn using backpropagation.
We will try to generalize this approach for some complex arithmetic functions and test the accuracy on a disjoint distribution with the training set.
arXiv Detail & Related papers (2020-10-28T14:28:24Z) - Supervised Quantile Normalization for Low-rank Matrix Approximation [50.445371939523305]
We learn the parameters of quantile normalization operators that can operate row-wise on the values of $X$ and/or of its factorization $UV$ to improve the quality of the low-rank representation of $X$ itself.
We demonstrate the applicability of these techniques on synthetic and genomics datasets.
arXiv Detail & Related papers (2020-02-08T21:06:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.