Shuffling Recurrent Neural Networks
- URL: http://arxiv.org/abs/2007.07324v1
- Date: Tue, 14 Jul 2020 19:36:10 GMT
- Title: Shuffling Recurrent Neural Networks
- Authors: Michael Rotman and Lior Wolf
- Abstract summary: We propose a novel recurrent neural network model, where the hidden state $h_t$ is obtained by permuting the vector elements of the previous hidden state $h_t-1$.
In our model, the prediction is given by a second learned function, which is applied to the hidden state $s(h_t)$.
- Score: 97.72614340294547
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a novel recurrent neural network model, where the hidden state
$h_t$ is obtained by permuting the vector elements of the previous hidden state
$h_{t-1}$ and adding the output of a learned function $b(x_t)$ of the input
$x_t$ at time $t$. In our model, the prediction is given by a second learned
function, which is applied to the hidden state $s(h_t)$. The method is easy to
implement, extremely efficient, and does not suffer from vanishing nor
exploding gradients. In an extensive set of experiments, the method shows
competitive results, in comparison to the leading literature baselines.
Related papers
- Training Overparametrized Neural Networks in Sublinear Time [14.918404733024332]
Deep learning comes at a tremendous computational and energy cost.
We present a new and a subset of binary neural networks, as a small subset of search trees, where each corresponds to a subset of search trees (Ds)
We believe this view would have further applications in analysis analysis of deep networks (Ds)
arXiv Detail & Related papers (2022-08-09T02:29:42Z) - Learning (Very) Simple Generative Models Is Hard [45.13248517769758]
We show that no-time algorithm can solve problem even when output coordinates of $mathbbRdtobbRd'$ are one-hidden-layer ReLU networks with $mathrmpoly(d)$ neurons.
Key ingredient in our proof is an ODE-based construction of a compactly supported, piecewise-linear function $f$ with neurally-bounded slopes such that the pushforward of $mathcalN(0,1)$ under $f$ matches all low-degree moments of $mathcal
arXiv Detail & Related papers (2022-05-31T17:59:09Z) - A Neural Network Ensemble Approach to System Identification [0.6445605125467573]
We present a new algorithm for learning unknown governing equations from trajectory data.
We approximate the function $f$ using an ensemble of neural networks.
arXiv Detail & Related papers (2021-10-15T21:45:48Z) - On the Provable Generalization of Recurrent Neural Networks [7.115768009778412]
We analyze the training and generalization for Recurrent Neural Network (RNN)
We prove a generalization error bound to learn functions without normalized conditions.
We also prove a novel result to learn N-variables functions of input sequence.
arXiv Detail & Related papers (2021-09-29T02:06:33Z) - Randomized Exploration for Reinforcement Learning with General Value
Function Approximation [122.70803181751135]
We propose a model-free reinforcement learning algorithm inspired by the popular randomized least squares value iteration (RLSVI) algorithm.
Our algorithm drives exploration by simply perturbing the training data with judiciously chosen i.i.d. scalar noises.
We complement the theory with an empirical evaluation across known difficult exploration tasks.
arXiv Detail & Related papers (2021-06-15T02:23:07Z) - Improved Sample Complexity for Incremental Autonomous Exploration in
MDPs [132.88757893161699]
We learn the set of $epsilon$-optimal goal-conditioned policies attaining all states that are incrementally reachable within $L$ steps.
DisCo is the first algorithm that can return an $epsilon/c_min$-optimal policy for any cost-sensitive shortest-path problem.
arXiv Detail & Related papers (2020-12-29T14:06:09Z) - Reconstructing cellular automata rules from observations at
nonconsecutive times [7.056222499095849]
Recent experiments show that a deep neural network can be trained to predict the action of Conway's Game of Life automaton.
We describe an alternative network-like method, based on constraint projections, where this is possible.
We demonstrate the method on 1D binary cellular automata that take inputs from $n$ adjacent cells.
arXiv Detail & Related papers (2020-12-03T18:55:40Z) - Improving Robustness and Generality of NLP Models Using Disentangled
Representations [62.08794500431367]
Supervised neural networks first map an input $x$ to a single representation $z$, and then map $z$ to the output label $y$.
We present methods to improve robustness and generality of NLP models from the standpoint of disentangled representation learning.
We show that models trained with the proposed criteria provide better robustness and domain adaptation ability in a wide range of supervised learning tasks.
arXiv Detail & Related papers (2020-09-21T02:48:46Z) - Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK [58.5766737343951]
We consider the dynamic of descent for learning a two-layer neural network.
We show that an over-parametrized two-layer neural network can provably learn with gradient loss at most ground with Tangent samples.
arXiv Detail & Related papers (2020-07-09T07:09:28Z) - Backward Feature Correction: How Deep Learning Performs Deep
(Hierarchical) Learning [66.05472746340142]
This paper analyzes how multi-layer neural networks can perform hierarchical learning _efficiently_ and _automatically_ by SGD on the training objective.
We establish a new principle called "backward feature correction", where the errors in the lower-level features can be automatically corrected when training together with the higher-level layers.
arXiv Detail & Related papers (2020-01-13T17:28:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.