Bayesian Recurrent Units and the Forward-Backward Algorithm
- URL: http://arxiv.org/abs/2207.10486v1
- Date: Thu, 21 Jul 2022 14:00:52 GMT
- Title: Bayesian Recurrent Units and the Forward-Backward Algorithm
- Authors: Alexandre Bittar and Philip N. Garner
- Abstract summary: Using Bayes's theorem, we derive a unit-wise recurrence as well as a backward recursion similar to the forward-backward algorithm.
The resulting Bayesian recurrent units can be integrated as recurrent neural networks within deep learning frameworks.
Experiments on speech recognition indicate that adding the derived units at the end of state-of-the-art recurrent architectures can improve the performance at a very low cost in terms of trainable parameters.
- Score: 91.39701446828144
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Using Bayes's theorem, we derive a unit-wise recurrence as well as a backward
recursion similar to the forward-backward algorithm. The resulting Bayesian
recurrent units can be integrated as recurrent neural networks within deep
learning frameworks, while retaining a probabilistic interpretation from the
direct correspondence with hidden Markov models. Whilst the contribution is
mainly theoretical, experiments on speech recognition indicate that adding the
derived units at the end of state-of-the-art recurrent architectures can
improve the performance at a very low cost in terms of trainable parameters.
Related papers
- Emergent representations in networks trained with the Forward-Forward algorithm [0.6597195879147556]
We show that the Forward-Forward algorithm can organise into category-specific ensembles exhibiting high sparsity.
Results suggest that the learning procedure proposed by Forward-Forward may be superior to Backpropagation in modelling learning in the cortex.
arXiv Detail & Related papers (2023-05-26T14:39:46Z) - A Lifted Bregman Formulation for the Inversion of Deep Neural Networks [28.03724379169264]
We propose a novel framework for the regularised inversion of deep neural networks.
The framework lifts the parameter space into a higher dimensional space by introducing auxiliary variables.
We present theoretical results and support their practical application with numerical examples.
arXiv Detail & Related papers (2023-03-01T20:30:22Z) - ResMem: Learn what you can and memorize the rest [79.19649788662511]
We propose the residual-memorization (ResMem) algorithm to augment an existing prediction model.
By construction, ResMem can explicitly memorize the training labels.
We show that ResMem consistently improves the test set generalization of the original prediction model.
arXiv Detail & Related papers (2023-02-03T07:12:55Z) - Reconstruction Probing [7.647452554776166]
We propose a new analysis method for contextualized representations based on reconstruction probabilities in masked language models.
We find that contextualization boostsability of tokens close to the token being reconstructed in terms of linear and syntactic distance.
We extend our analysis to finer decomposition of contextualized representations, and we find that these boosts are largely attributable to static and positional embeddings at the input layer.
arXiv Detail & Related papers (2022-12-21T06:22:03Z) - A Recursively Recurrent Neural Network (R2N2) Architecture for Learning
Iterative Algorithms [64.3064050603721]
We generalize Runge-Kutta neural network to a recurrent neural network (R2N2) superstructure for the design of customized iterative algorithms.
We demonstrate that regular training of the weight parameters inside the proposed superstructure on input/output data of various computational problem classes yields similar iterations to Krylov solvers for linear equation systems, Newton-Krylov solvers for nonlinear equation systems, and Runge-Kutta solvers for ordinary differential equations.
arXiv Detail & Related papers (2022-11-22T16:30:33Z) - Transformer Meets Boundary Value Inverse Problems [4.165221477234755]
Transformer-based deep direct sampling method is proposed for solving a class of boundary value inverse problem.
A real-time reconstruction is achieved by evaluating the learned inverse operator between carefully designed data and reconstructed images.
arXiv Detail & Related papers (2022-09-29T17:45:25Z) - End-to-end reconstruction meets data-driven regularization for inverse
problems [2.800608984818919]
We propose an unsupervised approach for learning end-to-end reconstruction operators for ill-posed inverse problems.
The proposed method combines the classical variational framework with iterative unrolling.
We demonstrate with the example of X-ray computed tomography (CT) that our approach outperforms state-of-the-art unsupervised methods.
arXiv Detail & Related papers (2021-06-07T12:05:06Z) - Relaxing the Constraints on Predictive Coding Models [62.997667081978825]
Predictive coding is an influential theory of cortical function which posits that the principal computation the brain performs is the minimization of prediction errors.
Standard implementations of the algorithm still involve potentially neurally implausible features such as identical forward and backward weights, backward nonlinear derivatives, and 1-1 error unit connectivity.
In this paper, we show that these features are not integral to the algorithm and can be removed either directly or through learning additional sets of parameters with Hebbian update rules without noticeable harm to learning performance.
arXiv Detail & Related papers (2020-10-02T15:21:37Z) - Towards a Theoretical Understanding of the Robustness of Variational
Autoencoders [82.68133908421792]
We make inroads into understanding the robustness of Variational Autoencoders (VAEs) to adversarial attacks and other input perturbations.
We develop a novel criterion for robustness in probabilistic models: $r$-robustness.
We show that VAEs trained using disentangling methods score well under our robustness metrics.
arXiv Detail & Related papers (2020-07-14T21:22:29Z) - Lipschitz Recurrent Neural Networks [100.72827570987992]
We show that our Lipschitz recurrent unit is more robust with respect to input and parameter perturbations as compared to other continuous-time RNNs.
Our experiments demonstrate that the Lipschitz RNN can outperform existing recurrent units on a range of benchmark tasks.
arXiv Detail & Related papers (2020-06-22T08:44:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.