A Random Matrix Perspective of Echo State Networks: From Precise Bias--Variance Characterization to Optimal Regularization
- URL: http://arxiv.org/abs/2509.22011v1
- Date: Fri, 26 Sep 2025 07:47:39 GMT
- Title: A Random Matrix Perspective of Echo State Networks: From Precise Bias--Variance Characterization to Optimal Regularization
- Authors: Yessin Moakher, Malik Tiomoko, Cosme Louart, Zhenyu Liao,
- Abstract summary: We present a rigorous analysis of Echo State Networks (ESNs) in a teacher student setting with a linear teacher oracle with weights.<n>We derive closed form expressions for the bias, variance, and mean-squared error (MSE) as functions of the input statistics, the oracle vector, and the ridge regularization parameter.
- Score: 7.721672385781673
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a rigorous asymptotic analysis of Echo State Networks (ESNs) in a teacher student setting with a linear teacher with oracle weights. Leveraging random matrix theory, we derive closed form expressions for the asymptotic bias, variance, and mean-squared error (MSE) as functions of the input statistics, the oracle vector, and the ridge regularization parameter. The analysis reveals two key departures from classical ridge regression: (i) ESNs do not exhibit double descent, and (ii) ESNs attain lower MSE when both the number of training samples and the teacher memory length are limited. We further provide an explicit formula for the optimal regularization in the identity input covariance case, and propose an efficient numerical scheme to compute the optimum in the general case. Together, these results offer interpretable theory and practical guidelines for tuning ESNs, helping reconcile recent empirical observations with provable performance guarantees
Related papers
- Optimal Estimation in Orthogonally Invariant Generalized Linear Models: Spectral Initialization and Approximate Message Passing [36.35215381372567]
This paper proposes optimal spectral estimators and an approximate message passing (AMP) algorithm.<n>Building on the paradigm of spectrally-d iterative optimization, this paper proposes optimal spectral estimators and an approximate message passing (AMP) algorithm.
arXiv Detail & Related papers (2026-02-09T22:08:09Z) - Variational Entropic Optimal Transport [67.76725267984578]
We propose Variational Entropic Optimal Transport (VarEOT) for domain translation problems.<n>VarEOT is based on an exact variational reformulation of the log-partition $log mathbbE[exp(cdot)$ as a tractable generalization over an auxiliary positive normalizer.<n> Experiments on synthetic data and unpaired image-to-image translation demonstrate competitive or improved translation quality.
arXiv Detail & Related papers (2026-02-02T15:48:44Z) - Non-Asymptotic Optimization and Generalization Bounds for Stochastic Gauss-Newton in Overparameterized Models [5.076419064097734]
We analyze a matrix Gauss-Newton (SGN) method with Levenberg-Marquardt damping and mini-batch sampling.<n>Our theoretical results identify a favorable generalization regime for SGN in which a larger minimum eigenvalue of the Gauss-Newton along the optimization path yields tighter stability bounds.
arXiv Detail & Related papers (2025-11-06T01:50:45Z) - Generalization in Representation Models via Random Matrix Theory: Application to Recurrent Networks [7.721672385781673]
We first study the generalization error of models that use a fixed feature representation (frozen intermediate layers) followed by a trainable readout layer.<n>We apply Random Matrix Theory to derive a closed-form expression for the generalization error.<n>We then apply this analysis to recurrent representations and obtain concise formula that characterize their performance.
arXiv Detail & Related papers (2025-11-04T09:30:31Z) - Neural Optimal Transport Meets Multivariate Conformal Prediction [58.43397908730771]
We propose a framework for conditional vectorile regression (CVQR)<n>CVQR combines neural optimal transport with quantized optimization, and apply it to predictions.
arXiv Detail & Related papers (2025-09-29T19:50:19Z) - Optimal Condition for Initialization Variance in Deep Neural Networks: An SGD Dynamics Perspective [0.0]
gradient descent (SGD) is one of the most fundamental optimization algorithms in machine learning (ML)<n>We study the relationship between the quasi-stationary distribution derived from this equation and the initial distribution through the Kullback-Leibler (KL) divergence.<n>We experimentally confirm our theoretical results by using the classical SGD to train fully connected neural networks on the MNIST and Fashion-MNIST datasets.
arXiv Detail & Related papers (2025-08-18T11:18:12Z) - Convergence of Implicit Gradient Descent for Training Two-Layer Physics-Informed Neural Networks [4.554284689395686]
implicit gradient descent (IGD) outperforms the common gradient descent (GD) algorithm in handling certain multi-scale problems.<n>We show that IGD converges to a globally optimal solution at a linear convergence rate.
arXiv Detail & Related papers (2024-07-03T06:10:41Z) - Asymptotically Unbiased Instance-wise Regularized Partial AUC
Optimization: Theory and Algorithm [101.44676036551537]
One-way Partial AUC (OPAUC) and Two-way Partial AUC (TPAUC) measures the average performance of a binary classifier.
Most of the existing methods could only optimize PAUC approximately, leading to inevitable biases that are not controllable.
We present a simpler reformulation of the PAUC problem via distributional robust optimization AUC.
arXiv Detail & Related papers (2022-10-08T08:26:22Z) - Double Descent in Random Feature Models: Precise Asymptotic Analysis for
General Convex Regularization [4.8900735721275055]
We provide precise expressions for the generalization of regression under a broad class of convex regularization terms.
We numerically demonstrate the predictive capacity of our framework, and show experimentally that the predicted test error is accurate even in the non-asymptotic regime.
arXiv Detail & Related papers (2022-04-06T08:59:38Z) - Learning Stochastic Graph Neural Networks with Constrained Variance [18.32587282139282]
graph neural networks (SGNNs) are information processing architectures that learn representations from data over random graphs.
We propose a variance-constrained optimization problem for SGNNs, balancing the expected performance and the deviation.
An alternating gradient-dual learning procedure is undertaken that solves the problem by updating the SGNN parameters with descent and the dual variable with ascent.
arXiv Detail & Related papers (2022-01-29T15:55:58Z) - On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD)
We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting.
We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z) - Efficient Semi-Implicit Variational Inference [65.07058307271329]
We propose an efficient and scalable semi-implicit extrapolational (SIVI)
Our method maps SIVI's evidence to a rigorous inference of lower gradient values.
arXiv Detail & Related papers (2021-01-15T11:39:09Z) - Stochastic Learning Approach to Binary Optimization for Optimal Design
of Experiments [0.0]
We present a novel approach to binary optimization for optimal experimental design (OED) for Bayesian inverse problems governed by mathematical models such as partial differential equations.
The OED utility function, namely, the regularized optimality gradient, is cast into an objective function in the form of an expectation over a Bernoulli distribution.
The objective is then solved by using a probabilistic optimization routine to find an optimal observational policy.
arXiv Detail & Related papers (2021-01-15T03:54:12Z) - Understanding Implicit Regularization in Over-Parameterized Single Index
Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model.
We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z) - Asymptotic Analysis of an Ensemble of Randomly Projected Linear
Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets.
We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator.
We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.