Related papers: Shuffling Recurrent Neural Networks

Shuffling Recurrent Neural Networks

URL: http://arxiv.org/abs/2007.07324v1
Date: Tue, 14 Jul 2020 19:36:10 GMT
Title: Shuffling Recurrent Neural Networks
Authors: Michael Rotman and Lior Wolf
Abstract summary: We propose a novel recurrent neural network model, where the hidden state $h_t$ is obtained by permuting the vector elements of the previous hidden state $h_t-1$. In our model, the prediction is given by a second learned function, which is applied to the hidden state $s(h_t)$.
Score: 97.72614340294547
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a novel recurrent neural network model, where the hidden state $h_t$ is obtained by permuting the vector elements of the previous hidden state $h_{t-1}$ and adding the output of a learned function $b(x_t)$ of the input $x_t$ at time $t$. In our model, the prediction is given by a second learned function, which is applied to the hidden state $s(h_t)$. The method is easy to implement, extremely efficient, and does not suffer from vanishing nor exploding gradients. In an extensive set of experiments, the method shows competitive results, in comparison to the leading literature baselines.

Related papers

Training Overparametrized Neural Networks in Sublinear Time [14.918404733024332]
Deep learning comes at a tremendous computational and energy cost. We present a new and a subset of binary neural networks, as a small subset of search trees, where each corresponds to a subset of search trees (Ds) We believe this view would have further applications in analysis analysis of deep networks (Ds)
arXiv Detail & Related papers (2022-08-09T02:29:42Z)
Learning (Very) Simple Generative Models Is Hard [45.13248517769758]
We show that no-time algorithm can solve problem even when output coordinates of $mathbbRdtobbRd'$ are one-hidden-layer ReLU networks with $mathrmpoly(d)$ neurons. Key ingredient in our proof is an ODE-based construction of a compactly supported, piecewise-linear function $f$ with neurally-bounded slopes such that the pushforward of $mathcalN(0,1)$ under $f$ matches all low-degree moments of $mathcal
arXiv Detail & Related papers (2022-05-31T17:59:09Z)
A Neural Network Ensemble Approach to System Identification [0.6445605125467573]
We present a new algorithm for learning unknown governing equations from trajectory data. We approximate the function $f$ using an ensemble of neural networks.
arXiv Detail & Related papers (2021-10-15T21:45:48Z)
On the Provable Generalization of Recurrent Neural Networks [7.115768009778412]
We analyze the training and generalization for Recurrent Neural Network (RNN) We prove a generalization error bound to learn functions without normalized conditions. We also prove a novel result to learn N-variables functions of input sequence.
arXiv Detail & Related papers (2021-09-29T02:06:33Z)
Randomized Exploration for Reinforcement Learning with General Value Function Approximation [122.70803181751135]
We propose a model-free reinforcement learning algorithm inspired by the popular randomized least squares value iteration (RLSVI) algorithm. Our algorithm drives exploration by simply perturbing the training data with judiciously chosen i.i.d. scalar noises. We complement the theory with an empirical evaluation across known difficult exploration tasks.
arXiv Detail & Related papers (2021-06-15T02:23:07Z)
Improved Sample Complexity for Incremental Autonomous Exploration in MDPs [132.88757893161699]
We learn the set of $epsilon$-optimal goal-conditioned policies attaining all states that are incrementally reachable within $L$ steps. DisCo is the first algorithm that can return an $epsilon/c_min$-optimal policy for any cost-sensitive shortest-path problem.
arXiv Detail & Related papers (2020-12-29T14:06:09Z)
Reconstructing cellular automata rules from observations at nonconsecutive times [7.056222499095849]
Recent experiments show that a deep neural network can be trained to predict the action of Conway's Game of Life automaton. We describe an alternative network-like method, based on constraint projections, where this is possible. We demonstrate the method on 1D binary cellular automata that take inputs from $n$ adjacent cells.
arXiv Detail & Related papers (2020-12-03T18:55:40Z)
Improving Robustness and Generality of NLP Models Using Disentangled Representations [62.08794500431367]
Supervised neural networks first map an input $x$ to a single representation $z$, and then map $z$ to the output label $y$. We present methods to improve robustness and generality of NLP models from the standpoint of disentangled representation learning. We show that models trained with the proposed criteria provide better robustness and domain adaptation ability in a wide range of supervised learning tasks.
arXiv Detail & Related papers (2020-09-21T02:48:46Z)
Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK [58.5766737343951]
We consider the dynamic of descent for learning a two-layer neural network. We show that an over-parametrized two-layer neural network can provably learn with gradient loss at most ground with Tangent samples.
arXiv Detail & Related papers (2020-07-09T07:09:28Z)
Backward Feature Correction: How Deep Learning Performs Deep (Hierarchical) Learning [66.05472746340142]
This paper analyzes how multi-layer neural networks can perform hierarchical learning _efficiently_ and _automatically_ by SGD on the training objective. We establish a new principle called "backward feature correction", where the errors in the lower-level features can be automatically corrected when training together with the higher-level layers.
arXiv Detail & Related papers (2020-01-13T17:28:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.