Related papers: Generalization in Representation Models via Random Matrix Theory: Application to Recurrent Networks

Generalization in Representation Models via Random Matrix Theory: Application to Recurrent Networks

URL: http://arxiv.org/abs/2511.02401v1
Date: Tue, 04 Nov 2025 09:30:31 GMT
Title: Generalization in Representation Models via Random Matrix Theory: Application to Recurrent Networks
Authors: Yessin Moakher, Malik Tiomoko, Cosme Louart, Zhenyu Liao,
Abstract summary: We first study the generalization error of models that use a fixed feature representation (frozen intermediate layers) followed by a trainable readout layer.<n>We apply Random Matrix Theory to derive a closed-form expression for the generalization error.<n>We then apply this analysis to recurrent representations and obtain concise formula that characterize their performance.
Score: 7.721672385781673
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We first study the generalization error of models that use a fixed feature representation (frozen intermediate layers) followed by a trainable readout layer. This setting encompasses a range of architectures, from deep random-feature models to echo-state networks (ESNs) with recurrent dynamics. Working in the high-dimensional regime, we apply Random Matrix Theory to derive a closed-form expression for the asymptotic generalization error. We then apply this analysis to recurrent representations and obtain concise formula that characterize their performance. Surprisingly, we show that a linear ESN is equivalent to ridge regression with an exponentially time-weighted (''memory'') input covariance, revealing a clear inductive bias toward recent inputs. Experiments match predictions: ESNs win in low-sample, short-memory regimes, while ridge prevails with more data or long-range dependencies. Our methodology provides a general framework for analyzing overparameterized models and offers insights into the behavior of deep learning networks.

Related papers

Self-Boost via Optimal Retraining: An Analysis via Approximate Message Passing [58.52119063742121]
Retraining a model using its own predictions together with the original, potentially noisy labels is a well-known strategy for improving the model performance.<n>This paper addresses the question of how to optimally combine the model's predictions and the provided labels.<n>Our main contribution is the derivation of the Bayes optimal aggregator function to combine the current model's predictions and the given labels.
arXiv Detail & Related papers (2025-05-21T07:16:44Z)
Recurrent Stochastic Configuration Networks with Hybrid Regularization for Nonlinear Dynamics Modelling [3.8719670789415925]
Recurrent configuration networks (RSCNs) have shown great potential in modelling nonlinear dynamic systems with uncertainties.<n>This paper presents an RSCN with hybrid regularization to enhance both the learning capacity and generalization performance of the network.
arXiv Detail & Related papers (2024-11-26T03:06:39Z)
Scaling and renormalization in high-dimensional regression [72.59731158970894]
We present a unifying perspective on recent results on ridge regression.<n>We use the basic tools of random matrix theory and free probability, aimed at readers with backgrounds in physics and deep learning.<n>Our results extend and provide a unifying perspective on earlier models of scaling laws.
arXiv Detail & Related papers (2024-05-01T15:59:00Z)
Koopman Kernel Regression [6.116741319526748]
We show that Koopman operator theory offers a beneficial paradigm for characterizing forecasts via linear time-invariant (LTI) ODEs. We derive a universal Koopman-invariant kernel reproducing Hilbert space (RKHS) that solely spans transformations into LTI dynamical systems. Our experiments demonstrate superior forecasting performance compared to Koopman operator and sequential data predictors.
arXiv Detail & Related papers (2023-05-25T16:22:22Z)
How (Implicit) Regularization of ReLU Neural Networks Characterizes the Learned Function -- Part II: the Multi-D Case of Two Layers with Random First Layer [2.1485350418225244]
We give an exact macroscopic characterization of the generalization behavior of randomized, shallow NNs with ReLU activation. We show that RSNs correspond to a generalized additive model (GAM)-typed regression in which infinitely many directions are considered.
arXiv Detail & Related papers (2023-03-20T21:05:47Z)
A predictive physics-aware hybrid reduced order model for reacting flows [65.73506571113623]
A new hybrid predictive Reduced Order Model (ROM) is proposed to solve reacting flow problems. The number of degrees of freedom is reduced from thousands of temporal points to a few POD modes with their corresponding temporal coefficients. Two different deep learning architectures have been tested to predict the temporal coefficients.
arXiv Detail & Related papers (2023-01-24T08:39:20Z)
Learning Low Dimensional State Spaces with Overparameterized Recurrent Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory. Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z)
Efficient Semi-Implicit Variational Inference [65.07058307271329]
We propose an efficient and scalable semi-implicit extrapolational (SIVI) Our method maps SIVI's evidence to a rigorous inference of lower gradient values.
arXiv Detail & Related papers (2021-01-15T11:39:09Z)
CASTLE: Regularization via Auxiliary Causal Graph Discovery [89.74800176981842]
We introduce Causal Structure Learning (CASTLE) regularization and propose to regularize a neural network by jointly learning the causal relationships between variables. CASTLE efficiently reconstructs only the features in the causal DAG that have a causal neighbor, whereas reconstruction-based regularizers suboptimally reconstruct all input features.
arXiv Detail & Related papers (2020-09-28T09:49:38Z)
Hierarchical regularization networks for sparsification based learning on noisy datasets [0.0]
hierarchy follows from approximation spaces identified at successively finer scales. For promoting model generalization at each scale, we also introduce a novel, projection based penalty operator across multiple dimension. Results show the performance of the approach as a data reduction and modeling strategy on both synthetic and real datasets.
arXiv Detail & Related papers (2020-06-09T18:32:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.