Related papers: Hadamard Representations: Augmenting Hyperbolic Tangents in RL

Hadamard Representations: Augmenting Hyperbolic Tangents in RL

URL: http://arxiv.org/abs/2406.09079v2
Date: Wed, 23 Oct 2024 08:05:57 GMT
Title: Hadamard Representations: Augmenting Hyperbolic Tangents in RL
Authors: Jacob E. Kooi, Mark Hoogendoorn, Vincent François-Lavet,
Abstract summary: Activation functions are one of the key components of a deep neural network. In reinforcement learning, the performance of continuously differentiable activations often falls short as compared to linear-unit functions. We show faster learning, a reduction in dead neurons and increased effective rank using deep Q-networks.
Score: 8.647614188619315
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Activation functions are one of the key components of a deep neural network. The most commonly used activation functions can be classed into the category of continuously differentiable (e.g. tanh) and linear-unit functions (e.g. ReLU), both having their own strengths and drawbacks with respect to downstream performance and representation capacity through learning (e.g. measured by the number of dead neurons and the effective rank). In reinforcement learning, the performance of continuously differentiable activations often falls short as compared to linear-unit functions. We provide insights into the vanishing gradients associated with the former, and show that the dying neuron problem is not exclusive to ReLU's. To alleviate vanishing gradients and the resulting dying neuron problem occurring with continuously differentiable activations, we propose a Hadamard representation. Using deep Q-networks and proximal policy optimization in the Atari domain, we show faster learning, a reduction in dead neurons and increased effective rank.

Related papers

Fractional Spike Differential Equations Neural Network with Efficient Adjoint Parameters Training [63.3991315762955]
Spiking Neural Networks (SNNs) draw inspiration from biological neurons to create realistic models for brain-like computation.<n>Most existing SNNs assume a single time constant for neuronal membrane voltage dynamics, modeled by first-order ordinary differential equations (ODEs) with Markovian characteristics.<n>We propose the Fractional SPIKE Differential Equation neural network (fspikeDE), which captures long-term dependencies in membrane voltage and spike trains through fractional-order dynamics.
arXiv Detail & Related papers (2025-07-22T18:20:56Z)
Elephant Neural Networks: Born to Be a Continual Learner [7.210328077827388]
Catastrophic forgetting remains a significant challenge to continual learning for decades. We study the role of activation functions in the training dynamics of neural networks and their impact on catastrophic forgetting. We show that by simply replacing classical activation functions with elephant activation functions, we can significantly improve the resilience of neural networks to catastrophic forgetting.
arXiv Detail & Related papers (2023-10-02T17:27:39Z)
Linear Oscillation: A Novel Activation Function for Vision Transformer [0.0]
We present the Linear Oscillation (LoC) activation function, defined as $f(x) = x times sin(alpha x + beta)$. Distinct from conventional activation functions which primarily introduce non-linearity, LoC seamlessly blends linear trajectories with oscillatory deviations. Our empirical studies reveal that, when integrated into diverse neural architectures, the LoC activation function consistently outperforms established counterparts like ReLU and Sigmoid.
arXiv Detail & Related papers (2023-08-25T20:59:51Z)
TSSR: A Truncated and Signed Square Root Activation Function for Neural Networks [5.9622541907827875]
We introduce a new activation function called the Truncated and Signed Square Root (TSSR) function. This function is distinctive because it is odd, nonlinear, monotone and differentiable. It has the potential to improve the numerical stability of neural networks.
arXiv Detail & Related papers (2023-08-09T09:40:34Z)
STL: A Signed and Truncated Logarithm Activation Function for Neural Networks [5.9622541907827875]
Activation functions play an essential role in neural networks. We present a novel signed and truncated logarithm function as activation function. The suggested activation function can be applied in a large range of neural networks.
arXiv Detail & Related papers (2023-07-31T03:41:14Z)
Complexity of Neural Network Training and ETR: Extensions with Effectively Continuous Functions [0.5352699766206808]
We study the complexity of the problem of training neural networks defined via various activation functions. We consider the complexity of the problem with respect to the sigmoid activation function and other effectively continuous functions.
arXiv Detail & Related papers (2023-05-19T17:17:00Z)
Globally Optimal Training of Neural Networks with Threshold Activation Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations. We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z)
Synaptic Stripping: How Pruning Can Bring Dead Neurons Back To Life [0.0]
We introduce Synaptic Stripping as a means to combat the dead neuron problem. By automatically removing problematic connections during training, we can regenerate dead neurons. We conduct several ablation studies to investigate these dynamics as a function of network width and depth.
arXiv Detail & Related papers (2023-02-11T23:55:50Z)
Data-aware customization of activation functions reduces neural network error [0.35172332086962865]
We show that data-aware customization of activation functions can result in striking reductions in neural network error. A simple substitution with the seagull'' activation function in an already-refined neural network can lead to an order-of-magnitude reduction in error.
arXiv Detail & Related papers (2023-01-16T23:38:37Z)
Learning Bayesian Sparse Networks with Full Experience Replay for Continual Learning [54.7584721943286]
Continual Learning (CL) methods aim to enable machine learning models to learn new tasks without catastrophic forgetting of those that have been previously mastered. Existing CL approaches often keep a buffer of previously-seen samples, perform knowledge distillation, or use regularization techniques towards this goal. We propose to only activate and select sparse neurons for learning current and past tasks at any stage.
arXiv Detail & Related papers (2022-02-21T13:25:03Z)
Biologically Inspired Oscillating Activation Functions Can Bridge the Performance Gap between Biological and Artificial Neurons [2.362412515574206]
This paper proposes four new oscillating activation functions inspired by human pyramidal neurons. Oscillating activation functions are non-saturating for all inputs unlike popular activation functions. Using oscillating activation functions instead of popular monotonic or non-monotonic single-zero activation functions enables neural networks to train faster and solve classification problems with fewer layers.
arXiv Detail & Related papers (2021-11-07T07:31:58Z)
Growing Cosine Unit: A Novel Oscillatory Activation Function That Can Speedup Training and Reduce Parameters in Convolutional Neural Networks [0.1529342790344802]
Convolution neural networks have been successful in solving many socially important and economically significant problems. Key discovery that made training deep networks feasible was the adoption of the Rectified Linear Unit (ReLU) activation function. New activation function C(z) = z cos z outperforms Sigmoids, Swish, Mish and ReLU on a variety of architectures.
arXiv Detail & Related papers (2021-08-30T01:07:05Z)
Adaptive Rational Activations to Boost Deep Reinforcement Learning [68.10769262901003]
We motivate why rationals are suitable for adaptable activation functions and why their inclusion into neural networks is crucial. We demonstrate that equipping popular algorithms with (recurrent-)rational activations leads to consistent improvements on Atari games.
arXiv Detail & Related papers (2021-02-18T14:53:12Z)
And/or trade-off in artificial neurons: impact on adversarial robustness [91.3755431537592]
Presence of sufficient number of OR-like neurons in a network can lead to classification brittleness and increased vulnerability to adversarial attacks. We define AND-like neurons and propose measures to increase their proportion in the network. Experimental results on the MNIST dataset suggest that our approach holds promise as a direction for further exploration.
arXiv Detail & Related papers (2021-02-15T08:19:05Z)
Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory [110.99247009159726]
Temporal-difference and Q-learning play a key role in deep reinforcement learning, where they are empowered by expressive nonlinear function approximators such as neural networks. In particular, temporal-difference learning converges when the function approximator is linear in a feature representation, which is fixed throughout learning, and possibly diverges otherwise.
arXiv Detail & Related papers (2020-06-08T17:25:22Z)
Towards Efficient Processing and Learning with Spikes: New Approaches for Multi-Spike Learning [59.249322621035056]
We propose two new multi-spike learning rules which demonstrate better performance over other baselines on various tasks. In the feature detection task, we re-examine the ability of unsupervised STDP with its limitations being presented. Our proposed learning rules can reliably solve the task over a wide range of conditions without specific constraints being applied.
arXiv Detail & Related papers (2020-05-02T06:41:20Z)
Non-linear Neurons with Human-like Apical Dendrite Activations [81.18416067005538]
We show that a standard neuron followed by our novel apical dendrite activation (ADA) can learn the XOR logical function with 100% accuracy. We conduct experiments on six benchmark data sets from computer vision, signal processing and natural language processing.
arXiv Detail & Related papers (2020-02-02T21:09:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.