Memory of recurrent networks: Do we compute it right?
- URL: http://arxiv.org/abs/2305.01457v2
- Date: Tue, 10 Sep 2024 07:58:25 GMT
- Title: Memory of recurrent networks: Do we compute it right?
- Authors: Giovanni Ballarin, Lyudmila Grigoryeva, Juan-Pablo Ortega,
- Abstract summary: We study the case of linear echo state networks, for which the total memory capacity has been proven to be equal to the rank of the corresponding Kalman controllability matrix.
We show that these issues, often overlooked in the recent literature, are of an exclusively numerical nature.
- Score: 5.03863830033243
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Numerical evaluations of the memory capacity (MC) of recurrent neural networks reported in the literature often contradict well-established theoretical bounds. In this paper, we study the case of linear echo state networks, for which the total memory capacity has been proven to be equal to the rank of the corresponding Kalman controllability matrix. We shed light on various reasons for the inaccurate numerical estimations of the memory, and we show that these issues, often overlooked in the recent literature, are of an exclusively numerical nature. More explicitly, we prove that when the Krylov structure of the linear MC is ignored, a gap between the theoretical MC and its empirical counterpart is introduced. As a solution, we develop robust numerical approaches by exploiting a result of MC neutrality with respect to the input mask matrix. Simulations show that the memory curves that are recovered using the proposed methods fully agree with the theory.
Related papers
- Uncertainty quantification for Markov chains with application to temporal difference learning [63.49764856675643]
We develop novel high-dimensional concentration inequalities and Berry-Esseen bounds for vector- and matrix-valued functions of Markov chains.
We analyze the TD learning algorithm, a widely used method for policy evaluation in reinforcement learning.
arXiv Detail & Related papers (2025-02-19T15:33:55Z) - Memory Capacity of Nonlinear Recurrent Networks: Is it Informative? [5.03863830033243]
The total memory capacity (MC) of linear recurrent neural networks (RNNs) has been proven to be equal to the rank of the corresponding Kalman controllability matrix.
This fact questions the usefulness of this metric in distinguishing the performance of linear RNNs in the processing of signals.
arXiv Detail & Related papers (2025-02-07T11:06:30Z) - A relativistic continuous matrix product state study of field theories with defects [0.0]
We propose a method to compute expectation values in massive Quantum Field Theories with line defects.
We use a quantization scheme where (imaginary) time runs perpendicularly to the defect.
We demonstrate the effectiveness of this machinery by computing correlation functions of local bulk and defect operators in $phi4$ theory with a magnetic line defect.
arXiv Detail & Related papers (2025-01-16T19:00:23Z) - Applications of flow models to the generation of correlated lattice QCD ensembles [69.18453821764075]
Machine-learned normalizing flows can be used in the context of lattice quantum field theory to generate statistically correlated ensembles of lattice gauge fields at different action parameters.
This work demonstrates how these correlations can be exploited for variance reduction in the computation of observables.
arXiv Detail & Related papers (2024-01-19T18:33:52Z) - Inferring networks from time series: a neural approach [3.115375810642661]
We present a powerful computational method to infer large network adjacency matrices from time series data using a neural network.
We demonstrate our capabilities by inferring line failure locations in the British power grid from its response to a power cut.
arXiv Detail & Related papers (2023-03-30T15:51:01Z) - Log-linear Guardedness and its Implications [116.87322784046926]
Methods for erasing human-interpretable concepts from neural representations that assume linearity have been found to be tractable and useful.
This work formally defines the notion of log-linear guardedness as the inability of an adversary to predict the concept directly from the representation.
We show that, in the binary case, under certain assumptions, a downstream log-linear model cannot recover the erased concept.
arXiv Detail & Related papers (2022-10-18T17:30:02Z) - Memory-Efficient Backpropagation through Large Linear Layers [107.20037639738433]
In modern neural networks like Transformers, linear layers require significant memory to store activations during backward pass.
This study proposes a memory reduction approach to perform backpropagation through linear layers.
arXiv Detail & Related papers (2022-01-31T13:02:41Z) - Error Bounds of the Invariant Statistics in Machine Learning of Ergodic
It\^o Diffusions [8.627408356707525]
We study the theoretical underpinnings of machine learning of ergodic Ito diffusions.
We deduce a linear dependence of the errors of one-point and two-point invariant statistics on the error in the learning of the drift and diffusion coefficients.
arXiv Detail & Related papers (2021-05-21T02:55:59Z) - Bayesian Uncertainty Estimation of Learned Variational MRI
Reconstruction [63.202627467245584]
We introduce a Bayesian variational framework to quantify the model-immanent (epistemic) uncertainty.
We demonstrate that our approach yields competitive results for undersampled MRI reconstruction.
arXiv Detail & Related papers (2021-02-12T18:08:14Z) - Constant-Expansion Suffices for Compressed Sensing with Generative
Priors [26.41623833920794]
We prove a novel uniform concentration for random functions that might not beschitz but satisfy a relaxed notion of Lipe-theoreticalness.
Since the WDC is a fundamental concentration inequality inequality of all existing theoretical guarantees on this problem, our bound improvements in all known results in the heart on with priors, including one, low-bit recovery, and more.
arXiv Detail & Related papers (2020-06-07T19:14:41Z) - Dropout: Explicit Forms and Capacity Control [57.36692251815882]
We investigate capacity control provided by dropout in various machine learning problems.
In deep learning, we show that the data-dependent regularizer due to dropout directly controls the Rademacher complexity of the underlying class of deep neural networks.
We evaluate our theoretical findings on real-world datasets, including MovieLens, MNIST, and Fashion-MNIST.
arXiv Detail & Related papers (2020-03-06T19:10:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.