Related papers: Complexity of Neural Network Training and ETR: Extensions with Effectively Continuous Functions

Complexity of Neural Network Training and ETR: Extensions with Effectively Continuous Functions

URL: http://arxiv.org/abs/2305.11833v1
Date: Fri, 19 May 2023 17:17:00 GMT
Title: Complexity of Neural Network Training and ETR: Extensions with Effectively Continuous Functions
Authors: Teemu Hankala, Miika Hannula, Juha Kontinen, Jonni Virtema
Abstract summary: We study the complexity of the problem of training neural networks defined via various activation functions. We consider the complexity of the problem with respect to the sigmoid activation function and other effectively continuous functions.
Score: 0.5352699766206808
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the complexity of the problem of training neural networks defined via various activation functions. The training problem is known to be existsR-complete with respect to linear activation functions and the ReLU activation function. We consider the complexity of the problem with respect to the sigmoid activation function and other effectively continuous functions. We show that these training problems are polynomial-time many-one bireducible to the existential theory of the reals extended with the corresponding activation functions. In particular, we establish that the sigmoid activation function leads to the existential theory of the reals with the exponential function. It is thus open, and equivalent with the decidability of the existential theory of the reals with the exponential function, whether training neural networks using the sigmoid activation function is algorithmically solvable. In contrast, we obtain that the training problem is undecidable if sinusoidal activation functions are considered. Finally, we obtain general upper bounds for the complexity of the training problem in the form of low levels of the arithmetical hierarchy.

Related papers

Provable In-Context Learning of Nonlinear Regression with Transformers [58.018629320233174]
In-context learning (ICL) is the ability to perform unseen tasks using task-specific prompts without updating parameters.<n>Recent research has actively explored the training dynamics behind ICL.<n>This paper investigates more complex nonlinear regression tasks, aiming to uncover how transformers acquire in-context learning capabilities.
arXiv Detail & Related papers (2025-07-28T00:09:28Z)
STL: A Signed and Truncated Logarithm Activation Function for Neural Networks [5.9622541907827875]
Activation functions play an essential role in neural networks. We present a novel signed and truncated logarithm function as activation function. The suggested activation function can be applied in a large range of neural networks.
arXiv Detail & Related papers (2023-07-31T03:41:14Z)
Promises and Pitfalls of the Linearized Laplace in Bayesian Optimization [73.80101701431103]
The linearized-Laplace approximation (LLA) has been shown to be effective and efficient in constructing Bayesian neural networks. We study the usefulness of the LLA in Bayesian optimization and highlight its strong performance and flexibility.
arXiv Detail & Related papers (2023-04-17T14:23:43Z)
Globally Optimal Training of Neural Networks with Threshold Activation Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations. We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z)
Unification of popular artificial neural network activation functions [0.0]
We present a unified representation of the most popular neural network activation functions. Adopting Mittag-Leffler functions of fractional calculus, we propose a flexible and compact functional form.
arXiv Detail & Related papers (2023-02-21T21:20:59Z)
Growing Cosine Unit: A Novel Oscillatory Activation Function That Can Speedup Training and Reduce Parameters in Convolutional Neural Networks [0.1529342790344802]
Convolution neural networks have been successful in solving many socially important and economically significant problems. Key discovery that made training deep networks feasible was the adoption of the Rectified Linear Unit (ReLU) activation function. New activation function C(z) = z cos z outperforms Sigmoids, Swish, Mish and ReLU on a variety of architectures.
arXiv Detail & Related papers (2021-08-30T01:07:05Z)
Adaptive Rational Activations to Boost Deep Reinforcement Learning [68.10769262901003]
We motivate why rationals are suitable for adaptable activation functions and why their inclusion into neural networks is crucial. We demonstrate that equipping popular algorithms with (recurrent-)rational activations leads to consistent improvements on Atari games.
arXiv Detail & Related papers (2021-02-18T14:53:12Z)
On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces [208.67848059021915]
We study the exploration-exploitation tradeoff at the core of reinforcement learning. In particular, we prove that the complexity of the function class $mathcalF$ characterizes the complexity of the function. Our regret bounds are independent of the number of episodes.
arXiv Detail & Related papers (2020-11-09T18:32:22Z)
UNIPoint: Universally Approximating Point Processes Intensities [125.08205865536577]
We provide a proof that a class of learnable functions can universally approximate any valid intensity function. We implement UNIPoint, a novel neural point process model, using recurrent neural networks to parameterise sums of basis function upon each event.
arXiv Detail & Related papers (2020-07-28T09:31:56Z)
Measuring Model Complexity of Neural Networks with Curve Activation Functions [100.98319505253797]
We propose the linear approximation neural network (LANN) to approximate a given deep model with curve activation function. We experimentally explore the training process of neural networks and detect overfitting. We find that the $L1$ and $L2$ regularizations suppress the increase of model complexity.
arXiv Detail & Related papers (2020-06-16T07:38:06Z)
A survey on modern trainable activation functions [0.0]
We propose a taxonomy of trainable activation functions and highlight common and distinctive proprieties of recent and past models. We show that many of the proposed approaches are equivalent to adding neuron layers which use fixed (non-trainable) activation functions.
arXiv Detail & Related papers (2020-05-02T12:38:43Z)
Deep Neural Networks with Trainable Activations and Controlled Lipschitz Constant [26.22495169129119]
We introduce a variational framework to learn the activation functions of deep neural networks. Our aim is to increase the capacity of the network while controlling an upper-bound of the Lipschitz constant. We numerically compare our scheme with standard ReLU network and its variations, PReLU and LeakyReLU.
arXiv Detail & Related papers (2020-01-17T12:32:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.