Related papers: Generalization Analysis and Method for Domain Generalization for a Family of Recurrent Neural Networks

Generalization Analysis and Method for Domain Generalization for a Family of Recurrent Neural Networks

URL: http://arxiv.org/abs/2601.08122v1
Date: Tue, 13 Jan 2026 01:27:10 GMT
Title: Generalization Analysis and Method for Domain Generalization for a Family of Recurrent Neural Networks
Authors: Atefeh Termehchi, Ekram Hossain, Isaac Woungang,
Abstract summary: This paper proposes a method to analyze interpretability and out-of-domain (OOD) generalization for a family of recurrent neural networks (RNNs)<n>Specifically, the evolution of a trained RNN's states is modeled as an unknown, discrete-time, nonlinear closed-loop feedback system.<n>A domain generalization method is proposed that reduces the OOD generalization error and improves the robustness to distribution shifts.
Score: 11.316438734182931
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Deep learning (DL) has driven broad advances across scientific and engineering domains. Despite its success, DL models often exhibit limited interpretability and generalization, which can undermine trust, especially in safety-critical deployments. As a result, there is growing interest in (i) analyzing interpretability and generalization and (ii) developing models that perform robustly under data distributions different from those seen during training (i.e. domain generalization). However, the theoretical analysis of DL remains incomplete. For example, many generalization analyses assume independent samples, which is violated in sequential data with temporal correlations. Motivated by these limitations, this paper proposes a method to analyze interpretability and out-of-domain (OOD) generalization for a family of recurrent neural networks (RNNs). Specifically, the evolution of a trained RNN's states is modeled as an unknown, discrete-time, nonlinear closed-loop feedback system. Using Koopman operator theory, these nonlinear dynamics are approximated with a linear operator, enabling interpretability. Spectral analysis is then used to quantify the worst-case impact of domain shifts on the generalization error. Building on this analysis, a domain generalization method is proposed that reduces the OOD generalization error and improves the robustness to distribution shifts. Finally, the proposed analysis and domain generalization approach are validated on practical temporal pattern-learning tasks.

Related papers

Unifying Learning Dynamics and Generalization in Transformers Scaling Law [1.5229257192293202]
The scaling law, a cornerstone of Large Language Model (LLM) development, predicts improvements in model performance with increasing computational resources.<n>This work formalizes the learning dynamics of transformer-based language models as an ordinary differential equation (ODE) system.<n>Our analysis characterizes the convergence of generalization error to the irreducible risk as computational resources scale with data.
arXiv Detail & Related papers (2025-12-26T17:20:09Z)
Algorithm- and Data-Dependent Generalization Bounds for Score-Based Generative Models [27.78637798976204]
Score-based generative models (SGMs) have emerged as one of the most popular classes of generative models.<n>This paper provides the first algorithmic- and data-dependent analysis for SGMs.<n>In particular, we account for the dynamics of the learning algorithm, offering new insights into the behavior of SGMs.
arXiv Detail & Related papers (2025-06-04T11:33:04Z)
Generalization for Least Squares Regression With Simple Spiked Covariances [3.9134031118910264]
The generalization properties of even two-layer neural networks trained by gradient descent remain poorly understood. Recent work has made progress by describing the spectrum of the feature matrix at the hidden layer. Yet, the generalization error for linear models with spiked covariances has not been previously determined.
arXiv Detail & Related papers (2024-10-17T19:46:51Z)
GIT: Detecting Uncertainty, Out-Of-Distribution and Adversarial Samples using Gradients and Invariance Transformations [77.34726150561087]
We propose a holistic approach for the detection of generalization errors in deep neural networks. GIT combines the usage of gradient information and invariance transformations. Our experiments demonstrate the superior performance of GIT compared to the state-of-the-art on a variety of network architectures.
arXiv Detail & Related papers (2023-07-05T22:04:38Z)
Generalization Analysis for Contrastive Representation Learning [80.89690821916653]
Existing generalization error bounds depend linearly on the number $k$ of negative examples. We establish novel generalization bounds for contrastive learning which do not depend on $k$, up to logarithmic terms.
arXiv Detail & Related papers (2023-02-24T01:03:56Z)
Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks. We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space. We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z)
Learning Low Dimensional State Spaces with Overparameterized Recurrent Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory. Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z)
Stability and Generalization Analysis of Gradient Methods for Shallow Neural Networks [59.142826407441106]
We study the generalization behavior of shallow neural networks (SNNs) by leveraging the concept of algorithmic stability. We consider gradient descent (GD) and gradient descent (SGD) to train SNNs, for both of which we develop consistent excess bounds.
arXiv Detail & Related papers (2022-09-19T18:48:00Z)
On the generalization of learning algorithms that do not converge [54.122745736433856]
Generalization analyses of deep learning typically assume that the training converges to a fixed point. Recent results indicate that in practice, the weights of deep neural networks optimized with gradient descent often oscillate indefinitely.
arXiv Detail & Related papers (2022-08-16T21:22:34Z)
Towards Data-Algorithm Dependent Generalization: a Case Study on Overparameterized Linear Regression [19.047997113063147]
We introduce a notion called data-algorithm compatibility, which considers the generalization behavior of the entire data-dependent training trajectory. We perform a data-dependent trajectory analysis and derive a sufficient condition for compatibility in such a setting.
arXiv Detail & Related papers (2022-02-12T12:42:36Z)
Generalization Error of Generalized Linear Models in High Dimensions [25.635225717360466]
We provide a framework to characterize neural networks with arbitrary non-linearities. We analyze the effect of regular logistic regression on learning. Our model also captures examples between training and distributions special cases.
arXiv Detail & Related papers (2020-05-01T02:17:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.