Related papers: The Impact of Anisotropic Covariance Structure on the Training Dynamics and Generalization Error of Linear Networks

The Impact of Anisotropic Covariance Structure on the Training Dynamics and Generalization Error of Linear Networks

URL: http://arxiv.org/abs/2601.06961v1
Date: Sun, 11 Jan 2026 15:37:39 GMT
Title: The Impact of Anisotropic Covariance Structure on the Training Dynamics and Generalization Error of Linear Networks
Authors: Taishi Watanabe, Ryo Karakida, Jun-nosuke Teramae,
Abstract summary: We study the impact of data anisotropy on the learning dynamics and generalization error of a two-layer linear network in a linear regression setting.<n>Our findings offer deep theoretical insights into how data anisotropy shapes the learning trajectory and final performance.
Score: 10.58305210918603
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The success of deep neural networks largely depends on the statistical structure of the training data. While learning dynamics and generalization on isotropic data are well-established, the impact of pronounced anisotropy on these crucial aspects is not yet fully understood. We examine the impact of data anisotropy, represented by a spiked covariance structure, a canonical yet tractable model, on the learning dynamics and generalization error of a two-layer linear network in a linear regression setting. Our analysis reveals that the learning dynamics proceed in two distinct phases, governed initially by the input-output correlation and subsequently by other principal directions of the data structure. Furthermore, we derive an analytical expression for the generalization error, quantifying how the alignment of the spike structure of the data with the learning task improves performance. Our findings offer deep theoretical insights into how data anisotropy shapes the learning trajectory and final performance, providing a foundation for understanding complex interactions in more advanced network architectures.

Related papers

From Linear to Nonlinear: Provable Weak-to-Strong Generalization through Feature Learning [27.3606707777401]
We provide a formal analysis of weak-to-strong generalization from a linear CNN (weak) to a two-layer ReLU CNN (strong)<n>Our analysis identifies two regimes -- data-scarce and data-abundant -- based on the signal-to-noise characteristics of the dataset.
arXiv Detail & Related papers (2025-10-28T07:53:24Z)
Binarized Neural Networks Converge Toward Algorithmic Simplicity: Empirical Support for the Learning-as-Compression Hypothesis [33.73453802399709]
We propose a shift toward algorithmic information theory, using Binarized Neural Networks (BNNs) as a first proxy.<n>We apply the Block Decomposition Method (BDM) and demonstrate it more closely tracks structural changes during training than entropy.<n>These results support the view of training as a process of algorithmic compression, where learning corresponds to the progressive internalization of structured regularities.
arXiv Detail & Related papers (2025-05-27T02:51:36Z)
Generalized Factor Neural Network Model for High-dimensional Regression [50.554377879576066]
We tackle the challenges of modeling high-dimensional data sets with latent low-dimensional structures hidden within complex, non-linear, and noisy relationships.<n>Our approach enables a seamless integration of concepts from non-parametric regression, factor models, and neural networks for high-dimensional regression.
arXiv Detail & Related papers (2025-02-16T23:13:55Z)
Generalization Performance of Hypergraph Neural Networks [21.483543928698676]
We develop margin-based generalization bounds for four representative classes of hypergraph neural networks.<n>Our results reveal the manner in which hypergraph structure and spectral norms of the learned weights can affect the generalization bounds.<n>Our empirical study examines the relationship between the practical performance and theoretical bounds of the models over synthetic and real-world datasets.
arXiv Detail & Related papers (2025-01-22T00:20:26Z)
Differentiation and Specialization of Attention Heads via the Refined Local Learning Coefficient [0.49478969093606673]
We introduce refined variants of the Local Learning Coefficient (LLC), a measure of model complexity grounded in singular learning theory. We study the development of internal structure in transformer language models during training.
arXiv Detail & Related papers (2024-10-03T20:51:02Z)
Generalization bound for estimating causal effects from observational network data [25.055822137402746]
We derive a generalization bound for causal effect estimation in network scenarios by exploiting 1) the reweighting schema based on joint propensity score and 2) the representation learning schema based on Integral Probability Metric (IPM) Motivated by the analysis of the bound, we propose a weighting regression method based on the joint propensity score augmented with representation learning.
arXiv Detail & Related papers (2023-08-08T03:14:34Z)
Towards a mathematical understanding of learning from few examples with nonlinear feature maps [68.8204255655161]
We consider the problem of data classification where the training set consists of just a few data points. We reveal key relationships between the geometry of an AI model's feature space, the structure of the underlying data distributions, and the model's generalisation capabilities.
arXiv Detail & Related papers (2022-11-07T14:52:58Z)
Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs. By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z)
Learning Neural Causal Models with Active Interventions [83.44636110899742]
We introduce an active intervention-targeting mechanism which enables a quick identification of the underlying causal structure of the data-generating process. Our method significantly reduces the required number of interactions compared with random intervention targeting. We demonstrate superior performance on multiple benchmarks from simulated to real-world data.
arXiv Detail & Related papers (2021-09-06T13:10:37Z)
Gradient Starvation: A Learning Proclivity in Neural Networks [97.02382916372594]
Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task. This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks.
arXiv Detail & Related papers (2020-11-18T18:52:08Z)
Network Diffusions via Neural Mean-Field Dynamics [52.091487866968286]
We propose a novel learning framework for inference and estimation problems of diffusion on networks. Our framework is derived from the Mori-Zwanzig formalism to obtain an exact evolution of the node infection probabilities. Our approach is versatile and robust to variations of the underlying diffusion network models.
arXiv Detail & Related papers (2020-06-16T18:45:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.