Short-Term Memory Optimization in Recurrent Neural Networks by
Autoencoder-based Initialization
- URL: http://arxiv.org/abs/2011.02886v1
- Date: Thu, 5 Nov 2020 14:57:16 GMT
- Title: Short-Term Memory Optimization in Recurrent Neural Networks by
Autoencoder-based Initialization
- Authors: Antonio Carta, Alessandro Sperduti, Davide Bacciu
- Abstract summary: We explore an alternative solution based on explicit memorization using linear autoencoders for sequences.
We show how such pretraining can better support solving hard classification tasks with long sequences.
We show that the proposed approach achieves a much lower reconstruction error for long sequences and a better gradient propagation during the finetuning phase.
- Score: 79.42778415729475
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training RNNs to learn long-term dependencies is difficult due to vanishing
gradients. We explore an alternative solution based on explicit memorization
using linear autoencoders for sequences, which allows to maximize the
short-term memory and that can be solved with a closed-form solution without
backpropagation. We introduce an initialization schema that pretrains the
weights of a recurrent neural network to approximate the linear autoencoder of
the input sequences and we show how such pretraining can better support solving
hard classification tasks with long sequences. We test our approach on
sequential and permuted MNIST. We show that the proposed approach achieves a
much lower reconstruction error for long sequences and a better gradient
propagation during the finetuning phase.
Related papers
- Return of the RNN: Residual Recurrent Networks for Invertible Sentence
Embeddings [0.0]
This study presents a novel model for invertible sentence embeddings using a residual recurrent network trained on an unsupervised encoding task.
Rather than the probabilistic outputs common to neural machine translation models, our approach employs a regression-based output layer to reconstruct the input sequence's word vectors.
The model achieves high accuracy and fast training with the ADAM, a significant finding given that RNNs typically require memory units, such as LSTMs, or second-order optimization methods.
arXiv Detail & Related papers (2023-03-23T15:59:06Z) - Learning Large-scale Neural Fields via Context Pruned Meta-Learning [60.93679437452872]
We introduce an efficient optimization-based meta-learning technique for large-scale neural field training.
We show how gradient re-scaling at meta-test time allows the learning of extremely high-quality neural fields.
Our framework is model-agnostic, intuitive, straightforward to implement, and shows significant reconstruction improvements for a wide range of signals.
arXiv Detail & Related papers (2023-02-01T17:32:16Z) - Non-Gradient Manifold Neural Network [79.44066256794187]
Deep neural network (DNN) generally takes thousands of iterations to optimize via gradient descent.
We propose a novel manifold neural network based on non-gradient optimization.
arXiv Detail & Related papers (2021-06-15T06:39:13Z) - Low-memory stochastic backpropagation with multi-channel randomized
trace estimation [6.985273194899884]
We propose to approximate the gradient of convolutional layers in neural networks with a multi-channel randomized trace estimation technique.
Compared to other methods, this approach is simple, amenable to analyses, and leads to a greatly reduced memory footprint.
We discuss the performance of networks trained with backpropagation and how the error can be controlled while maximizing memory usage and minimizing computational overhead.
arXiv Detail & Related papers (2021-06-13T13:54:02Z) - Online Limited Memory Neural-Linear Bandits with Likelihood Matching [53.18698496031658]
We study neural-linear bandits for solving problems where both exploration and representation learning play an important role.
We propose a likelihood matching algorithm that is resilient to catastrophic forgetting and is completely online.
arXiv Detail & Related papers (2021-02-07T14:19:07Z) - Recurrent Neural Networks for Stochastic Control Problems with Delay [0.76146285961466]
We propose and systematically study deep neural networks-based algorithms to solve control problems with delay features.
Specifically, we employ neural networks for sequence modeling to parameterize the policy and optimize the objective function.
The proposed algorithms are tested on three benchmark examples: a linear-quadratic problem, optimal consumption with fixed finite delay, and portfolio optimization with complete memory.
arXiv Detail & Related papers (2021-01-05T07:18:47Z) - Activation Relaxation: A Local Dynamical Approximation to
Backpropagation in the Brain [62.997667081978825]
Activation Relaxation (AR) is motivated by constructing the backpropagation gradient as the equilibrium point of a dynamical system.
Our algorithm converges rapidly and robustly to the correct backpropagation gradients, requires only a single type of computational unit, and can operate on arbitrary computation graphs.
arXiv Detail & Related papers (2020-09-11T11:56:34Z) - Tunable Subnetwork Splitting for Model-parallelism of Neural Network
Training [12.755664985045582]
We propose a Tunable Subnetwork Splitting Method (TSSM) to tune the decomposition of deep neural networks.
Our proposed TSSM can achieve significant speedup without observable loss of training accuracy.
arXiv Detail & Related papers (2020-09-09T01:05:12Z) - MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks.
The use of gradient combined nonvolutionity renders learning susceptible to novel problems.
We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.