Approximate Fixed-Points in Recurrent Neural Networks
- URL: http://arxiv.org/abs/2106.02417v1
- Date: Fri, 4 Jun 2021 11:33:34 GMT
- Title: Approximate Fixed-Points in Recurrent Neural Networks
- Authors: Zhengxiong Wang and Anton Ragni
- Abstract summary: Recurrent neural networks are widely used in speech and language processing.
Due to dependency on the past, standard algorithms for training these models cannot be efficiently parallelised.
This paper shows that recurrent neural networks can be reformulated as fixed-points of non-linear equation systems.
- Score: 10.031004070657122
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recurrent neural networks are widely used in speech and language processing.
Due to dependency on the past, standard algorithms for training these models,
such as back-propagation through time (BPTT), cannot be efficiently
parallelised. Furthermore, applying these models to more complex structures
than sequences requires inference time approximations, which introduce
inconsistency between inference and training. This paper shows that recurrent
neural networks can be reformulated as fixed-points of non-linear equation
systems. These fixed-points can be computed using an iterative algorithm
exactly and in as many iterations as the length of any given sequence. Each
iteration of this algorithm adds one additional Markovian-like order of
dependencies such that upon termination all dependencies modelled by the
recurrent neural networks have been incorporated. Although exact fixed-points
inherit the same parallelization and inconsistency issues, this paper shows
that approximate fixed-points can be computed in parallel and used consistently
in training and inference including tasks such as lattice rescoring.
Experimental validation is performed in two tasks, Penn Tree Bank and
WikiText-2, and shows that approximate fixed-points yield competitive
prediction performance to recurrent neural networks trained using the BPTT
algorithm.
Related papers
- Continual learning with the neural tangent ensemble [0.6137178191238463]
We show that a neural network with N parameters can be interpreted as a weighted ensemble of N classifiers.
We derive the likelihood and posterior probability of each expert given past data.
Surprisingly, we learn that the posterior updates for these experts are equivalent to a scaled and projected form of gradient descent.
arXiv Detail & Related papers (2024-08-30T16:29:09Z) - Time-Parameterized Convolutional Neural Networks for Irregularly Sampled
Time Series [26.77596449192451]
Irregularly sampled time series are ubiquitous in several application domains, leading to sparse, not fully-observed and non-aligned observations.
Standard sequential neural networks (RNNs) and convolutional neural networks (CNNs) consider regular spacing between observation times, posing significant challenges to irregular time series modeling.
We parameterize convolutional layers by employing time-explicitly irregular kernels.
arXiv Detail & Related papers (2023-08-06T21:10:30Z) - The Cascaded Forward Algorithm for Neural Network Training [61.06444586991505]
We propose a new learning framework for neural networks, namely Cascaded Forward (CaFo) algorithm, which does not rely on BP optimization as that in FF.
Unlike FF, our framework directly outputs label distributions at each cascaded block, which does not require generation of additional negative samples.
In our framework each block can be trained independently, so it can be easily deployed into parallel acceleration systems.
arXiv Detail & Related papers (2023-03-17T02:01:11Z) - A Recursively Recurrent Neural Network (R2N2) Architecture for Learning
Iterative Algorithms [64.3064050603721]
We generalize Runge-Kutta neural network to a recurrent neural network (R2N2) superstructure for the design of customized iterative algorithms.
We demonstrate that regular training of the weight parameters inside the proposed superstructure on input/output data of various computational problem classes yields similar iterations to Krylov solvers for linear equation systems, Newton-Krylov solvers for nonlinear equation systems, and Runge-Kutta solvers for ordinary differential equations.
arXiv Detail & Related papers (2022-11-22T16:30:33Z) - Backpropagation Through Time For Networks With Long-Term Dependencies [0.0]
Backpropagation through time (BPTT) is a technique of updating tuned parameters within recurrent neural networks (RNNs)
We propose using the 'discrete forward sensitivity equation' and a variant of it for single and multiple interacting recurrent loops respectively.
This solution is exact and also allows the network's parameters to vary between each subsequent step, however it does require the computation of a Jacobian.
arXiv Detail & Related papers (2021-03-26T15:55:54Z) - A Convergence Theory Towards Practical Over-parameterized Deep Neural
Networks [56.084798078072396]
We take a step towards closing the gap between theory and practice by significantly improving the known theoretical bounds on both the network width and the convergence time.
We show that convergence to a global minimum is guaranteed for networks with quadratic widths in the sample size and linear in their depth at a time logarithmic in both.
Our analysis and convergence bounds are derived via the construction of a surrogate network with fixed activation patterns that can be transformed at any time to an equivalent ReLU network of a reasonable size.
arXiv Detail & Related papers (2021-01-12T00:40:45Z) - Compressive Sensing and Neural Networks from a Statistical Learning
Perspective [4.561032960211816]
We present a generalization error analysis for a class of neural networks suitable for sparse reconstruction from few linear measurements.
Under realistic conditions, the generalization error scales only logarithmically in the number of layers, and at most linear in number of measurements.
arXiv Detail & Related papers (2020-10-29T15:05:43Z) - Connecting Weighted Automata, Tensor Networks and Recurrent Neural
Networks through Spectral Learning [58.14930566993063]
We present connections between three models used in different research fields: weighted finite automata(WFA) from formal languages and linguistics, recurrent neural networks used in machine learning, and tensor networks.
We introduce the first provable learning algorithm for linear 2-RNN defined over sequences of continuous vectors input.
arXiv Detail & Related papers (2020-10-19T15:28:00Z) - AIN: Fast and Accurate Sequence Labeling with Approximate Inference
Network [75.44925576268052]
The linear-chain Conditional Random Field (CRF) model is one of the most widely-used neural sequence labeling approaches.
Exact probabilistic inference algorithms are typically applied in training and prediction stages of the CRF model.
We propose to employ a parallelizable approximate variational inference algorithm for the CRF model.
arXiv Detail & Related papers (2020-09-17T12:18:43Z) - Accelerating Feedforward Computation via Parallel Nonlinear Equation
Solving [106.63673243937492]
Feedforward computation, such as evaluating a neural network or sampling from an autoregressive model, is ubiquitous in machine learning.
We frame the task of feedforward computation as solving a system of nonlinear equations. We then propose to find the solution using a Jacobi or Gauss-Seidel fixed-point method, as well as hybrid methods of both.
Our method is guaranteed to give exactly the same values as the original feedforward computation with a reduced (or equal) number of parallelizable iterations, and hence reduced time given sufficient parallel computing power.
arXiv Detail & Related papers (2020-02-10T10:11:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.