Related papers: Separation of Memory and Processing in Dual Recurrent Neural Networks

Separation of Memory and Processing in Dual Recurrent Neural Networks

URL: http://arxiv.org/abs/2005.13971v1
Date: Sun, 17 May 2020 11:38:42 GMT
Title: Separation of Memory and Processing in Dual Recurrent Neural Networks
Authors: Christian Oliva and Luis F. Lago-Fern\'andez
Abstract summary: We explore a neural network architecture that stacks a recurrent layer and a feedforward layer that is also connected to the input. When noise is introduced into the activation function of the recurrent units, these neurons are forced into a binary activation regime that makes the networks behave much as finite automata.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We explore a neural network architecture that stacks a recurrent layer and a feedforward layer that is also connected to the input, and compare it to standard Elman and LSTM architectures in terms of accuracy and interpretability. When noise is introduced into the activation function of the recurrent units, these neurons are forced into a binary activation regime that makes the networks behave much as finite automata. The resulting models are simpler, easier to interpret and get higher accuracy on different sample problems, including the recognition of regular languages, the computation of additions in different bases and the generation of arithmetic expressions.

Related papers

Deep-Unrolling Multidimensional Harmonic Retrieval Algorithms on Neuromorphic Hardware [78.17783007774295]
This paper explores the potential of conversion-based neuromorphic algorithms for highly accurate and energy-efficient single-snapshot multidimensional harmonic retrieval. A novel method for converting the complex-valued convolutional layers and activations into spiking neural networks (SNNs) is developed. The converted SNNs achieve almost five-fold power efficiency at moderate performance loss compared to the original CNNs.
arXiv Detail & Related papers (2024-12-05T09:41:33Z)
Lipschitz constant estimation for general neural network architectures using control tools [0.05120567378386613]
This paper is devoted to the estimation of the Lipschitz constant of general neural network architectures using semidefinite programming. We interpret neural networks as time-varying dynamical systems, where the $k$th layer corresponds to the dynamics at time $k$.
arXiv Detail & Related papers (2024-05-02T09:38:16Z)
Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters. Our approach enables a single model to encode neural computational graphs with diverse architectures. We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z)
Exploring the Approximation Capabilities of Multiplicative Neural Networks for Smooth Functions [9.936974568429173]
We consider two classes of target functions: generalized bandlimited functions and Sobolev-Type balls. Our results demonstrate that multiplicative neural networks can approximate these functions with significantly fewer layers and neurons. These findings suggest that multiplicative gates can outperform standard feed-forward layers and have potential for improving neural network design.
arXiv Detail & Related papers (2023-01-11T17:57:33Z)
A Recursively Recurrent Neural Network (R2N2) Architecture for Learning Iterative Algorithms [64.3064050603721]
We generalize Runge-Kutta neural network to a recurrent neural network (R2N2) superstructure for the design of customized iterative algorithms. We demonstrate that regular training of the weight parameters inside the proposed superstructure on input/output data of various computational problem classes yields similar iterations to Krylov solvers for linear equation systems, Newton-Krylov solvers for nonlinear equation systems, and Runge-Kutta solvers for ordinary differential equations.
arXiv Detail & Related papers (2022-11-22T16:30:33Z)
Seeking Interpretability and Explainability in Binary Activated Neural Networks [2.828173677501078]
We study the use of binary activated neural networks as interpretable and explainable predictors in the context of regression tasks. We present an approach based on the efficient computation of SHAP values for quantifying the relative importance of the features, hidden neurons and even weights.
arXiv Detail & Related papers (2022-09-07T20:11:17Z)
Bayesian Neural Network Language Modeling for Speech Recognition [59.681758762712754]
State-of-the-art neural network language models (NNLMs) represented by long short term memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming highly complex. In this paper, an overarching full Bayesian learning framework is proposed to account for the underlying uncertainty in LSTM-RNN and Transformer LMs.
arXiv Detail & Related papers (2022-08-28T17:50:19Z)
Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs. By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z)
Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules. inputs to the model are routed through a sequence of functions in a way that is end-to-end learned. We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z)
Implicit recurrent networks: A novel approach to stationary input processing with recurrent neural networks in deep learning [0.0]
In this work, we introduce and test a novel implementation of recurrent neural networks into deep learning. We provide an algorithm which implements the backpropagation algorithm on a implicit implementation of recurrent networks. A single-layer implicit recurrent network is able to solve the XOR problem, while a feed-forward network with monotonically increasing activation function fails at this task.
arXiv Detail & Related papers (2020-10-20T18:55:32Z)
Incremental Training of a Recurrent Neural Network Exploiting a Multi-Scale Dynamic Memory [79.42778415729475]
We propose a novel incrementally trained recurrent architecture targeting explicitly multi-scale learning. We show how to extend the architecture of a simple RNN by separating its hidden state into different modules. We discuss a training algorithm where new modules are iteratively added to the model to learn progressively longer dependencies.
arXiv Detail & Related papers (2020-06-29T08:35:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.