Related papers: State space models, emergence, and ergodicity: How many parameters are needed for stable predictions?

State space models, emergence, and ergodicity: How many parameters are needed for stable predictions?

URL: http://arxiv.org/abs/2409.13421v1
Date: Fri, 20 Sep 2024 11:39:37 GMT
Title: State space models, emergence, and ergodicity: How many parameters are needed for stable predictions?
Authors: Ingvar Ziemann, Nikolai Matni, George J. Pappas,
Abstract summary: We show that tasks exhibiting substantial long-range correlation require a certain critical number of parameters. We also investigate the role of the learner's parametrization and consider a simple version of a linear dynamical system with hidden state.
Score: 28.65576793023554
License: http://creativecommons.org/licenses/by/4.0/
Abstract: How many parameters are required for a model to execute a given task? It has been argued that large language models, pre-trained via self-supervised learning, exhibit emergent capabilities such as multi-step reasoning as their number of parameters reach a critical scale. In the present work, we explore whether this phenomenon can analogously be replicated in a simple theoretical model. We show that the problem of learning linear dynamical systems -- a simple instance of self-supervised learning -- exhibits a corresponding phase transition. Namely, for every non-ergodic linear system there exists a critical threshold such that a learner using fewer parameters than said threshold cannot achieve bounded error for large sequence lengths. Put differently, in our model we find that tasks exhibiting substantial long-range correlation require a certain critical number of parameters -- a phenomenon akin to emergence. We also investigate the role of the learner's parametrization and consider a simple version of a linear dynamical system with hidden state -- an imperfectly observed random walk in $\mathbb{R}$. For this situation, we show that there exists no learner using a linear filter which can succesfully learn the random walk unless the filter length exceeds a certain threshold depending on the effective memory length and horizon of the problem.

Related papers

Identifying overparameterization in Quantum Circuit Born Machines [1.7259898169307613]
We study the onset of over parameterization transitions for quantum circuit Born machines, generative models that are trained using non-adversarial gradient methods. Our results indicate that fully understanding the trainability of these models remains an open question.
arXiv Detail & Related papers (2023-07-06T21:05:22Z)
Neural network analysis of neutron and X-ray reflectivity data: Incorporating prior knowledge for tackling the phase problem [141.5628276096321]
We present an approach that utilizes prior knowledge to regularize the training process over larger parameter spaces. We demonstrate the effectiveness of our method in various scenarios, including multilayer structures with box model parameterization. In contrast to previous methods, our approach scales favorably when increasing the complexity of the inverse problem.
arXiv Detail & Related papers (2023-06-28T11:15:53Z)
Theoretical Characterization of the Generalization Performance of Overfitted Meta-Learning [70.52689048213398]
This paper studies the performance of overfitted meta-learning under a linear regression model with Gaussian features. We find new and interesting properties that do not exist in single-task linear regression. Our analysis suggests that benign overfitting is more significant and easier to observe when the noise and the diversity/fluctuation of the ground truth of each training task are large.
arXiv Detail & Related papers (2023-04-09T20:36:13Z)
Particle-Based Score Estimation for State Space Model Learning in Autonomous Driving [62.053071723903834]
Multi-object state estimation is a fundamental problem for robotic applications. We consider learning maximum-likelihood parameters using particle methods. We apply our method to real data collected from autonomous vehicles.
arXiv Detail & Related papers (2022-12-14T01:21:05Z)
Learning Low Dimensional State Spaces with Overparameterized Recurrent Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory. Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z)
Neural parameter calibration for large-scale multi-agent models [0.7734726150561089]
We present a method to retrieve accurate probability densities for parameters using neural equations. The two combined create a powerful tool that can quickly estimate densities on model parameters, even for very large systems.
arXiv Detail & Related papers (2022-09-27T17:36:26Z)
A Causality-Based Learning Approach for Discovering the Underlying Dynamics of Complex Systems from Partial Observations with Stochastic Parameterization [1.2882319878552302]
This paper develops a new iterative learning algorithm for complex turbulent systems with partial observations. It alternates between identifying model structures, recovering unobserved variables, and estimating parameters. Numerical experiments show that the new algorithm succeeds in identifying the model structure and providing suitable parameterizations for many complex nonlinear systems.
arXiv Detail & Related papers (2022-08-19T00:35:03Z)
Masked prediction tasks: a parameter identifiability view [49.533046139235466]
We focus on the widely used self-supervised learning method of predicting masked tokens. We show that there is a rich landscape of possibilities, out of which some prediction tasks yield identifiability, while others do not.
arXiv Detail & Related papers (2022-02-18T17:09:32Z)
Sufficiently Accurate Model Learning for Planning [119.80502738709937]
This paper introduces the constrained Sufficiently Accurate model learning approach. It provides examples of such problems, and presents a theorem on how close some approximate solutions can be. The approximate solution quality will depend on the function parameterization, loss and constraint function smoothness, and the number of samples in model learning.
arXiv Detail & Related papers (2021-02-11T16:27:31Z)
Provable Benefits of Overparameterization in Model Compression: From Double Descent to Pruning Neural Networks [38.153825455980645]
Recent empirical evidence indicates that the practice of overization not only benefits training large models, but also assists - perhaps counterintuitively - building lightweight models. This paper sheds light on these empirical findings by theoretically characterizing the high-dimensional toolsets of model pruning. We analytically identify regimes in which, even if the location of the most informative features is known, we are better off fitting a large model and then pruning.
arXiv Detail & Related papers (2020-12-16T05:13:30Z)
Variational Inference and Learning of Piecewise-linear Dynamical Systems [33.23231229260119]
We propose a variational approximation of piecewise linear dynamical systems. We show that the model parameters can be split into two sets, static and dynamic parameters, and that the former parameters can be estimated off-line together with the number of linear modes, or the number of states of the switching variable.
arXiv Detail & Related papers (2020-06-02T14:40:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.