Related papers: Memory of recurrent networks: Do we compute it right?

Memory of recurrent networks: Do we compute it right?

URL: http://arxiv.org/abs/2305.01457v2
Date: Tue, 10 Sep 2024 07:58:25 GMT
Title: Memory of recurrent networks: Do we compute it right?
Authors: Giovanni Ballarin, Lyudmila Grigoryeva, Juan-Pablo Ortega,
Abstract summary: We study the case of linear echo state networks, for which the total memory capacity has been proven to be equal to the rank of the corresponding Kalman controllability matrix. We show that these issues, often overlooked in the recent literature, are of an exclusively numerical nature.
Score: 5.03863830033243
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Numerical evaluations of the memory capacity (MC) of recurrent neural networks reported in the literature often contradict well-established theoretical bounds. In this paper, we study the case of linear echo state networks, for which the total memory capacity has been proven to be equal to the rank of the corresponding Kalman controllability matrix. We shed light on various reasons for the inaccurate numerical estimations of the memory, and we show that these issues, often overlooked in the recent literature, are of an exclusively numerical nature. More explicitly, we prove that when the Krylov structure of the linear MC is ignored, a gap between the theoretical MC and its empirical counterpart is introduced. As a solution, we develop robust numerical approaches by exploiting a result of MC neutrality with respect to the input mask matrix. Simulations show that the memory curves that are recovered using the proposed methods fully agree with the theory.

Related papers

Determinant Estimation under Memory Constraints and Neural Scaling Laws [48.68885778257016]
We derive a novel hierarchical algorithm for large-scale log-determinant calculation in memory-constrained settings. We show that the ratio of pseudo-determinants satisfies a power-law relationship, allowing us to derive corresponding scaling laws. This enables accurate estimation of NTK log-determinants from a tiny fraction of the full dataset.
arXiv Detail & Related papers (2025-03-06T13:32:13Z)
Uncertainty quantification for Markov chains with application to temporal difference learning [63.49764856675643]
We develop novel high-dimensional concentration inequalities and Berry-Esseen bounds for vector- and matrix-valued functions of Markov chains. We analyze the TD learning algorithm, a widely used method for policy evaluation in reinforcement learning.
arXiv Detail & Related papers (2025-02-19T15:33:55Z)
Memory Capacity of Nonlinear Recurrent Networks: Is it Informative? [5.03863830033243]
The total memory capacity (MC) of linear recurrent neural networks (RNNs) has been proven to be equal to the rank of the corresponding Kalman controllability matrix. This fact questions the usefulness of this metric in distinguishing the performance of linear RNNs in the processing of signals.
arXiv Detail & Related papers (2025-02-07T11:06:30Z)
A relativistic continuous matrix product state study of field theories with defects [0.0]
We propose a method to compute expectation values in massive Quantum Field Theories with line defects. We use a quantization scheme where (imaginary) time runs perpendicularly to the defect. We demonstrate the effectiveness of this machinery by computing correlation functions of local bulk and defect operators in $phi4$ theory with a magnetic line defect.
arXiv Detail & Related papers (2025-01-16T19:00:23Z)
Applications of flow models to the generation of correlated lattice QCD ensembles [69.18453821764075]
Machine-learned normalizing flows can be used in the context of lattice quantum field theory to generate statistically correlated ensembles of lattice gauge fields at different action parameters. This work demonstrates how these correlations can be exploited for variance reduction in the computation of observables.
arXiv Detail & Related papers (2024-01-19T18:33:52Z)
The Decimation Scheme for Symmetric Matrix Factorization [0.0]
Matrix factorization is an inference problem that has acquired importance due to its vast range of applications. We study this extensive rank problem, extending the alternative 'decimation' procedure that we recently introduced. We introduce a simple algorithm based on a ground state search that implements decimation and performs matrix factorization.
arXiv Detail & Related papers (2023-07-31T10:53:45Z)
Inferring networks from time series: a neural approach [3.115375810642661]
We present a powerful computational method to infer large network adjacency matrices from time series data using a neural network. We demonstrate our capabilities by inferring line failure locations in the British power grid from its response to a power cut.
arXiv Detail & Related papers (2023-03-30T15:51:01Z)
Learning Discretized Neural Networks under Ricci Flow [51.36292559262042]
We study Discretized Neural Networks (DNNs) composed of low-precision weights and activations. DNNs suffer from either infinite or zero gradients due to the non-differentiable discrete function during training.
arXiv Detail & Related papers (2023-02-07T10:51:53Z)
Log-linear Guardedness and its Implications [116.87322784046926]
Methods for erasing human-interpretable concepts from neural representations that assume linearity have been found to be tractable and useful. This work formally defines the notion of log-linear guardedness as the inability of an adversary to predict the concept directly from the representation. We show that, in the binary case, under certain assumptions, a downstream log-linear model cannot recover the erased concept.
arXiv Detail & Related papers (2022-10-18T17:30:02Z)
Memory-Efficient Backpropagation through Large Linear Layers [107.20037639738433]
In modern neural networks like Transformers, linear layers require significant memory to store activations during backward pass. This study proposes a memory reduction approach to perform backpropagation through linear layers.
arXiv Detail & Related papers (2022-01-31T13:02:41Z)
Cram\'er-Rao bound-informed training of neural networks for quantitative MRI [11.964144201247198]
Neural networks are increasingly used to estimate parameters in quantitative MRI, in particular in magnetic resonance fingerprinting. Their advantages are their superior speed and their dominance of the non-efficient unbiased estimator. We find, however, that heterogeneous parameters are hard to estimate. We propose a well-founded Cram'erRao loss function, which normalizes the squared error with respective CRB.
arXiv Detail & Related papers (2021-09-22T06:38:03Z)
Error Bounds of the Invariant Statistics in Machine Learning of Ergodic It\^o Diffusions [8.627408356707525]
We study the theoretical underpinnings of machine learning of ergodic Ito diffusions. We deduce a linear dependence of the errors of one-point and two-point invariant statistics on the error in the learning of the drift and diffusion coefficients.
arXiv Detail & Related papers (2021-05-21T02:55:59Z)
Bayesian Uncertainty Estimation of Learned Variational MRI Reconstruction [63.202627467245584]
We introduce a Bayesian variational framework to quantify the model-immanent (epistemic) uncertainty. We demonstrate that our approach yields competitive results for undersampled MRI reconstruction.
arXiv Detail & Related papers (2021-02-12T18:08:14Z)
Constant-Expansion Suffices for Compressed Sensing with Generative Priors [26.41623833920794]
We prove a novel uniform concentration for random functions that might not beschitz but satisfy a relaxed notion of Lipe-theoreticalness. Since the WDC is a fundamental concentration inequality inequality of all existing theoretical guarantees on this problem, our bound improvements in all known results in the heart on with priors, including one, low-bit recovery, and more.
arXiv Detail & Related papers (2020-06-07T19:14:41Z)
Dropout: Explicit Forms and Capacity Control [57.36692251815882]
We investigate capacity control provided by dropout in various machine learning problems. In deep learning, we show that the data-dependent regularizer due to dropout directly controls the Rademacher complexity of the underlying class of deep neural networks. We evaluate our theoretical findings on real-world datasets, including MovieLens, MNIST, and Fashion-MNIST.
arXiv Detail & Related papers (2020-03-06T19:10:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.