Understanding Generalization from Embedding Dimension and Distributional Convergence
- URL: http://arxiv.org/abs/2601.22756v1
- Date: Fri, 30 Jan 2026 09:32:04 GMT
- Title: Understanding Generalization from Embedding Dimension and Distributional Convergence
- Authors: Junjie Yu, Zhuoli Ouyang, Haotian Deng, Chen Wei, Wenxiao Ma, Jianyu Zhang, Zihan Deng, Quanying Liu,
- Abstract summary: We study generalization from a representation-centric perspective and analyze how the geometry of learned embeddings controls predictive performance for a fixed trained model.<n>We show that population risk can be bounded by two factors: (i) the intrinsic dimension of the embedding distribution, which determines the convergence rate of empirical embedding distribution to the population distribution in Wasserstein distance, and (ii) the sensitivity of the downstream mapping from embeddings to predictions, characterized by Lipschitz constants.
- Score: 13.491874401333021
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks often generalize well despite heavy over-parameterization, challenging classical parameter-based analyses. We study generalization from a representation-centric perspective and analyze how the geometry of learned embeddings controls predictive performance for a fixed trained model. We show that population risk can be bounded by two factors: (i) the intrinsic dimension of the embedding distribution, which determines the convergence rate of empirical embedding distribution to the population distribution in Wasserstein distance, and (ii) the sensitivity of the downstream mapping from embeddings to predictions, characterized by Lipschitz constants. Together, these yield an embedding-dependent error bound that does not rely on parameter counts or hypothesis class complexity. At the final embedding layer, architectural sensitivity vanishes and the bound is dominated by embedding dimension, explaining its strong empirical correlation with generalization performance. Experiments across architectures and datasets validate the theory and demonstrate the utility of embedding-based diagnostics.
Related papers
- Conjugate Learning Theory: Uncovering the Mechanisms of Trainability and Generalization in Deep Neural Networks [0.0]
We develop a conjugate learning theoretical framework based on convex conjugate duality to characterize this learnability property.<n>We demonstrate that training deep neural networks (DNNs) with mini-batch descent (SGD) achieves global optima of empirical risk.<n>We derive deterministic and probabilistic bounds on generalization error based on conditional generalized entropy measures.
arXiv Detail & Related papers (2026-02-18T04:26:55Z) - Nonparametric Identification and Inference for Counterfactual Distributions with Confounding [6.997978440999076]
We propose nonparametric identification and semiparametric estimation of joint potential outcome in the presence of confounding.<n>By bridging classical semiparametric theory with modern representation learning, this work provides a robust statistical foundation for distributional and counterfactual inference in complex causal systems.
arXiv Detail & Related papers (2026-02-17T05:00:13Z) - A Multiplicative Neural Network Architecture: Locality and Regularity of Appriximation [1.7188280334580195]
We introduce a multiplicative neural network architecture in which multiplicative interactions constitute the fundamental representation.<n>We analyze its approximation properties in terms of locality and regularity in Bessel potential spaces.
arXiv Detail & Related papers (2026-02-06T04:07:50Z) - Towards A Unified PAC-Bayesian Framework for Norm-based Generalization Bounds [63.47271262149291]
We propose a unified framework for PAC-Bayesian norm-based generalization.<n>The key to our approach is a sensitivity matrix that quantifies the network outputs with respect to structured weight perturbations.<n>We derive a family of generalization bounds that recover several existing PAC-Bayesian results as special cases.
arXiv Detail & Related papers (2026-01-13T00:42:22Z) - Generalization Below the Edge of Stability: The Role of Data Geometry [60.147710896851045]
We show how data geometry controls generalization in ReLU networks trained below the edge of stability.<n>For data distributions supported on a mixture of low-dimensional balls, we derive generalization bounds that provably adapt to the intrinsic dimension.<n>Our results consolidate disparate empirical findings that have appeared in the literature.
arXiv Detail & Related papers (2025-10-20T21:40:36Z) - Error Bounds of Supervised Classification from Information-Theoretic Perspective [0.0]
We explore bounds on the expected risk when using deep neural networks for supervised classification from an information theoretic perspective.
We introduce model risk and fitting error, which are derived from further decomposing the empirical risk.
arXiv Detail & Related papers (2024-06-07T01:07:35Z) - On the Generalization Properties of Diffusion Models [31.067038651873126]
This work embarks on a comprehensive theoretical exploration of the generalization attributes of diffusion models.<n>We establish theoretical estimates of the generalization gap that evolves in tandem with the training dynamics of score-based diffusion models.<n>We extend our quantitative analysis to a data-dependent scenario, wherein target distributions are portrayed as a succession of densities.
arXiv Detail & Related papers (2023-11-03T09:20:20Z) - Learning Linear Causal Representations from Interventions under General
Nonlinear Mixing [52.66151568785088]
We prove strong identifiability results given unknown single-node interventions without access to the intervention targets.
This is the first instance of causal identifiability from non-paired interventions for deep neural network embeddings.
arXiv Detail & Related papers (2023-06-04T02:32:12Z) - Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks.
We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space.
We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z) - Fluctuations, Bias, Variance & Ensemble of Learners: Exact Asymptotics
for Convex Losses in High-Dimension [25.711297863946193]
We develop a theory for the study of fluctuations in an ensemble of generalised linear models trained on different, but correlated, features.
We provide a complete description of the joint distribution of the empirical risk minimiser for generic convex loss and regularisation in the high-dimensional limit.
arXiv Detail & Related papers (2022-01-31T17:44:58Z) - Predicting Unreliable Predictions by Shattering a Neural Network [145.3823991041987]
Piecewise linear neural networks can be split into subfunctions.
Subfunctions have their own activation pattern, domain, and empirical error.
Empirical error for the full network can be written as an expectation over subfunctions.
arXiv Detail & Related papers (2021-06-15T18:34:41Z) - Asymptotic Analysis of an Ensemble of Randomly Projected Linear
Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets.
We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator.
We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.