Hadamard Representations: Augmenting Hyperbolic Tangents in RL
- URL: http://arxiv.org/abs/2406.09079v3
- Date: Fri, 29 Nov 2024 13:24:58 GMT
- Title: Hadamard Representations: Augmenting Hyperbolic Tangents in RL
- Authors: Jacob E. Kooi, Mark Hoogendoorn, Vincent François-Lavet,
- Abstract summary: Activation functions are one of the key components of a deep neural network.
In reinforcement learning, the performance of continuously differentiable activations often falls short as compared to piece-wise linear functions.
We show faster learning, a reduction in dead neurons and increased effective rank using deep Q-networks.
- Score: 8.647614188619315
- License:
- Abstract: Activation functions are one of the key components of a deep neural network. The most commonly used activation functions can be classed into the category of continuously differentiable (e.g. tanh) and piece-wise linear functions (e.g. ReLU), both having their own strengths and drawbacks with respect to downstream performance and representation capacity through learning (e.g. measured by the number of dead neurons and the effective rank). In reinforcement learning, the performance of continuously differentiable activations often falls short as compared to piece-wise linear functions. We provide insights into the vanishing gradients associated with the former, and show that the dying neuron problem is not exclusive to ReLU's. To alleviate vanishing gradients and the resulting dying neuron problem occurring with continuously differentiable activations, we propose a Hadamard representation. Using deep Q-networks, proximal policy optimization and parallelized Q-networks in the Atari domain, we show faster learning, a reduction in dead neurons and increased effective rank.
Related papers
- Elephant Neural Networks: Born to Be a Continual Learner [7.210328077827388]
Catastrophic forgetting remains a significant challenge to continual learning for decades.
We study the role of activation functions in the training dynamics of neural networks and their impact on catastrophic forgetting.
We show that by simply replacing classical activation functions with elephant activation functions, we can significantly improve the resilience of neural networks to catastrophic forgetting.
arXiv Detail & Related papers (2023-10-02T17:27:39Z) - Linear Oscillation: A Novel Activation Function for Vision Transformer [0.0]
We present the Linear Oscillation (LoC) activation function, defined as $f(x) = x times sin(alpha x + beta)$.
Distinct from conventional activation functions which primarily introduce non-linearity, LoC seamlessly blends linear trajectories with oscillatory deviations.
Our empirical studies reveal that, when integrated into diverse neural architectures, the LoC activation function consistently outperforms established counterparts like ReLU and Sigmoid.
arXiv Detail & Related papers (2023-08-25T20:59:51Z) - TSSR: A Truncated and Signed Square Root Activation Function for Neural
Networks [5.9622541907827875]
We introduce a new activation function called the Truncated and Signed Square Root (TSSR) function.
This function is distinctive because it is odd, nonlinear, monotone and differentiable.
It has the potential to improve the numerical stability of neural networks.
arXiv Detail & Related papers (2023-08-09T09:40:34Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Synaptic Stripping: How Pruning Can Bring Dead Neurons Back To Life [0.0]
We introduce Synaptic Stripping as a means to combat the dead neuron problem.
By automatically removing problematic connections during training, we can regenerate dead neurons.
We conduct several ablation studies to investigate these dynamics as a function of network width and depth.
arXiv Detail & Related papers (2023-02-11T23:55:50Z) - Growing Cosine Unit: A Novel Oscillatory Activation Function That Can
Speedup Training and Reduce Parameters in Convolutional Neural Networks [0.1529342790344802]
Convolution neural networks have been successful in solving many socially important and economically significant problems.
Key discovery that made training deep networks feasible was the adoption of the Rectified Linear Unit (ReLU) activation function.
New activation function C(z) = z cos z outperforms Sigmoids, Swish, Mish and ReLU on a variety of architectures.
arXiv Detail & Related papers (2021-08-30T01:07:05Z) - Adaptive Rational Activations to Boost Deep Reinforcement Learning [68.10769262901003]
We motivate why rationals are suitable for adaptable activation functions and why their inclusion into neural networks is crucial.
We demonstrate that equipping popular algorithms with (recurrent-)rational activations leads to consistent improvements on Atari games.
arXiv Detail & Related papers (2021-02-18T14:53:12Z) - And/or trade-off in artificial neurons: impact on adversarial robustness [91.3755431537592]
Presence of sufficient number of OR-like neurons in a network can lead to classification brittleness and increased vulnerability to adversarial attacks.
We define AND-like neurons and propose measures to increase their proportion in the network.
Experimental results on the MNIST dataset suggest that our approach holds promise as a direction for further exploration.
arXiv Detail & Related papers (2021-02-15T08:19:05Z) - Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory [110.99247009159726]
Temporal-difference and Q-learning play a key role in deep reinforcement learning, where they are empowered by expressive nonlinear function approximators such as neural networks.
In particular, temporal-difference learning converges when the function approximator is linear in a feature representation, which is fixed throughout learning, and possibly diverges otherwise.
arXiv Detail & Related papers (2020-06-08T17:25:22Z) - Towards Efficient Processing and Learning with Spikes: New Approaches
for Multi-Spike Learning [59.249322621035056]
We propose two new multi-spike learning rules which demonstrate better performance over other baselines on various tasks.
In the feature detection task, we re-examine the ability of unsupervised STDP with its limitations being presented.
Our proposed learning rules can reliably solve the task over a wide range of conditions without specific constraints being applied.
arXiv Detail & Related papers (2020-05-02T06:41:20Z) - Non-linear Neurons with Human-like Apical Dendrite Activations [81.18416067005538]
We show that a standard neuron followed by our novel apical dendrite activation (ADA) can learn the XOR logical function with 100% accuracy.
We conduct experiments on six benchmark data sets from computer vision, signal processing and natural language processing.
arXiv Detail & Related papers (2020-02-02T21:09:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.