A ReLU Dense Layer to Improve the Performance of Neural Networks
- URL: http://arxiv.org/abs/2010.13572v1
- Date: Thu, 22 Oct 2020 11:56:01 GMT
- Title: A ReLU Dense Layer to Improve the Performance of Neural Networks
- Authors: Alireza M. Javid, Sandipan Das, Mikael Skoglund, and Saikat Chatterjee
- Abstract summary: We propose ReDense as a simple and low complexity way to improve the performance of trained neural networks.
We experimentally show that ReDense can improve the training and testing performance of various neural network architectures.
- Score: 40.2470651460466
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose ReDense as a simple and low complexity way to improve the
performance of trained neural networks. We use a combination of random weights
and rectified linear unit (ReLU) activation function to add a ReLU dense
(ReDense) layer to the trained neural network such that it can achieve a lower
training loss. The lossless flow property (LFP) of ReLU is the key to achieve
the lower training loss while keeping the generalization error small. ReDense
does not suffer from vanishing gradient problem in the training due to having a
shallow structure. We experimentally show that ReDense can improve the training
and testing performance of various neural network architectures with different
optimization loss and activation functions. Finally, we test ReDense on some of
the state-of-the-art architectures and show the performance improvement on
benchmark datasets.
Related papers
- A Coefficient Makes SVRG Effective [55.104068027239656]
Variance Reduced Gradient (SVRG) is a theoretically compelling optimization method.
In this work, we demonstrate the potential of SVRG in optimizing real-world neural networks.
Our analysis finds that, for deeper networks, the strength of the variance reduction term in SVRG should be smaller and decrease as training progresses.
arXiv Detail & Related papers (2023-11-09T18:47:44Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Learning in Feedback-driven Recurrent Spiking Neural Networks using
full-FORCE Training [4.124948554183487]
We propose a supervised training procedure for RSNNs, where a second network is introduced only during the training.
The proposed training procedure consists of generating targets for both recurrent and readout layers.
We demonstrate the improved performance and noise robustness of the proposed full-FORCE training procedure to model 8 dynamical systems.
arXiv Detail & Related papers (2022-05-26T19:01:19Z) - The Impact of Reinitialization on Generalization in Convolutional Neural
Networks [3.462210753108297]
We study the impact of different reinitialization methods in several convolutional architectures across 12 benchmark image classification datasets.
We introduce a new layerwise reinitialization algorithm that outperforms previous methods.
Our takeaway message is that the accuracy of convolutional neural networks can be improved for small datasets using bottom-up layerwise reinitialization.
arXiv Detail & Related papers (2021-09-01T09:25:57Z) - Over-and-Under Complete Convolutional RNN for MRI Reconstruction [57.95363471940937]
Recent deep learning-based methods for MR image reconstruction usually leverage a generic auto-encoder architecture.
We propose an Over-and-Under Complete Convolu?tional Recurrent Neural Network (OUCR), which consists of an overcomplete and an undercomplete Convolutional Recurrent Neural Network(CRNN)
The proposed method achieves significant improvements over the compressed sensing and popular deep learning-based methods with less number of trainable parameters.
arXiv Detail & Related papers (2021-06-16T15:56:34Z) - Enabling Incremental Training with Forward Pass for Edge Devices [0.0]
We introduce a method using evolutionary strategy (ES) that can partially retrain the network enabling it to adapt to changes and recover after an error has occurred.
This technique enables training on an inference-only hardware without the need to use backpropagation and with minimal resource overhead.
arXiv Detail & Related papers (2021-03-25T17:43:04Z) - Improving Computational Efficiency in Visual Reinforcement Learning via
Stored Embeddings [89.63764845984076]
We present Stored Embeddings for Efficient Reinforcement Learning (SEER)
SEER is a simple modification of existing off-policy deep reinforcement learning methods.
We show that SEER does not degrade the performance of RLizable agents while significantly saving computation and memory.
arXiv Detail & Related papers (2021-03-04T08:14:10Z) - Accelerated MRI with Un-trained Neural Networks [29.346778609548995]
We address the reconstruction problem arising in accelerated MRI with un-trained neural networks.
We propose a highly optimized un-trained recovery approach based on a variation of the Deep Decoder.
We find that our un-trained algorithm achieves similar performance to a baseline trained neural network, but a state-of-the-art trained network outperforms the un-trained one.
arXiv Detail & Related papers (2020-07-06T00:01:25Z) - Retrospective Loss: Looking Back to Improve Training of Deep Neural
Networks [15.329684157845872]
We introduce a new retrospective loss to improve the training of deep neural network models.
Minimizing the retrospective loss, along with the task-specific loss, pushes the parameter state at the current training step towards the optimal parameter state.
Although a simple idea, we analyze the method as well as to conduct comprehensive sets of experiments across domains.
arXiv Detail & Related papers (2020-06-24T10:16:36Z) - Multi-fidelity Neural Architecture Search with Knowledge Distillation [69.09782590880367]
We propose a bayesian multi-fidelity method for neural architecture search: MF-KD.
Knowledge distillation adds to a loss function a term forcing a network to mimic some teacher network.
We show that training for a few epochs with such a modified loss function leads to a better selection of neural architectures than training for a few epochs with a logistic loss.
arXiv Detail & Related papers (2020-06-15T12:32:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.