A solvable high-dimensional model where nonlinear autoencoders learn structure invisible to PCA while test loss misaligns with generalization
- URL: http://arxiv.org/abs/2602.10680v1
- Date: Wed, 11 Feb 2026 09:31:29 GMT
- Title: A solvable high-dimensional model where nonlinear autoencoders learn structure invisible to PCA while test loss misaligns with generalization
- Authors: Vicente Conde Mendes, Lorenzo Bardone, Cédric Koller, Jorge Medina Moreira, Vittorio Erba, Emanuele Troiani, Lenka Zdeborová,
- Abstract summary: We introduce a tractable high-dimensional spiked model with two latent factors.<n> PCA and linear autoencoders fail to recover the latter, while a minimal nonlinear autoencoder provably extracts both.<n>Our model also provides a tractable example where self-supervised test loss is poorly aligned with representation quality.
- Score: 12.791897914144378
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many real-world datasets contain hidden structure that cannot be detected by simple linear correlations between input features. For example, latent factors may influence the data in a coordinated way, even though their effect is invisible to covariance-based methods such as PCA. In practice, nonlinear neural networks often succeed in extracting such hidden structure in unsupervised and self-supervised learning. However, constructing a minimal high-dimensional model where this advantage can be rigorously analyzed has remained an open theoretical challenge. We introduce a tractable high-dimensional spiked model with two latent factors: one visible to covariance, and one statistically dependent yet uncorrelated, appearing only in higher-order moments. PCA and linear autoencoders fail to recover the latter, while a minimal nonlinear autoencoder provably extracts both. We analyze both the population risk, and empirical risk minimization. Our model also provides a tractable example where self-supervised test loss is poorly aligned with representation quality: nonlinear autoencoders recover latent structure that linear methods miss, even though their reconstruction loss is higher.
Related papers
- On the Limits of Self-Improving in LLMs and Why AGI, ASI and the Singularity Are Not Near Without Symbolic Model Synthesis [0.01269104766024433]
We formalise self-training in Large Language Models (LLMs) and Generative AI as a discrete-time dynamical system.<n>We derive two fundamental failure modes: (1) Entropy Decay, where finite sampling effects cause a monotonic loss of distributional diversity (mode collapse), and (2) Variance Amplification, where the loss of external grounding causes the model's representation of truth to drift as a random walk.
arXiv Detail & Related papers (2026-01-05T19:50:49Z) - Nonlinear Multiple Response Regression and Learning of Latent Spaces [2.6113259186042876]
We introduce a unified method capable of learning latent spaces in both unsupervised and supervised settings.<n>Unlike other neural network methods that operate as "black boxes", our approach not only offers better interpretability but also reduces computational complexity.
arXiv Detail & Related papers (2025-03-27T15:28:06Z) - Robustness of Nonlinear Representation Learning [60.15898117103069]
We study the problem of unsupervised representation learning in slightly misspecified settings.<n>We show that the mixing can be identified up to linear transformations and small errors.<n>Those results are a step towards identifiability results for unsupervised representation learning for real-world data.
arXiv Detail & Related papers (2025-03-19T15:57:03Z) - Adversarial Dependence Minimization [78.36795688238155]
This work provides a differentiable and scalable algorithm for dependence minimization that goes beyond linear pairwise decorrelation.<n>We demonstrate its utility in three applications: extending PCA to nonlinear decorrelation, improving the generalization of image classification methods, and preventing dimensional collapse in self-supervised representation learning.
arXiv Detail & Related papers (2025-02-05T14:43:40Z) - Induced Covariance for Causal Discovery in Linear Sparse Structures [55.2480439325792]
Causal models seek to unravel the cause-effect relationships among variables from observed data.
This paper introduces a novel causal discovery algorithm designed for settings in which variables exhibit linearly sparse relationships.
arXiv Detail & Related papers (2024-10-02T04:01:38Z) - Unveiling Multiple Descents in Unsupervised Autoencoders [25.244065166421517]
We show for the first time that double and triple descent can be observed with nonlinear unsupervised autoencoders.<n>Through extensive experiments on both synthetic and real datasets, we uncover model-wise, epoch-wise, and sample-wise double descent.
arXiv Detail & Related papers (2024-06-17T16:24:23Z) - Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data.
We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z) - Posterior Collapse and Latent Variable Non-identifiability [54.842098835445]
We propose a class of latent-identifiable variational autoencoders, deep generative models which enforce identifiability without sacrificing flexibility.
Across synthetic and real datasets, latent-identifiable variational autoencoders outperform existing methods in mitigating posterior collapse and providing meaningful representations of the data.
arXiv Detail & Related papers (2023-01-02T06:16:56Z) - Self-Supervised Training with Autoencoders for Visual Anomaly Detection [61.62861063776813]
We focus on a specific use case in anomaly detection where the distribution of normal samples is supported by a lower-dimensional manifold.
We adapt a self-supervised learning regime that exploits discriminative information during training but focuses on the submanifold of normal examples.
We achieve a new state-of-the-art result on the MVTec AD dataset -- a challenging benchmark for visual anomaly detection in the manufacturing domain.
arXiv Detail & Related papers (2022-06-23T14:16:30Z) - On the Regularization of Autoencoders [14.46779433267854]
We show that the unsupervised setting by itself induces strong additional regularization, i.e., a severe reduction in the model-capacity of the learned autoencoder.
We derive that a deep nonlinear autoencoder cannot fit the training data more accurately than a linear autoencoder does if both models have the same dimensionality in their last layer.
We demonstrate that it is an accurate approximation across all model-ranks in our experiments on three well-known data sets.
arXiv Detail & Related papers (2021-10-21T18:28:25Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.