Landscape Complexity for the Empirical Risk of Generalized Linear Models: Discrimination between Structured Data
- URL: http://arxiv.org/abs/2503.14403v1
- Date: Tue, 18 Mar 2025 16:44:33 GMT
- Title: Landscape Complexity for the Empirical Risk of Generalized Linear Models: Discrimination between Structured Data
- Authors: Theodoros G. Tsironis, Aris L. Moustakas,
- Abstract summary: We use the Kac-Rice formula and results from random matrix theory to obtain the average number of critical points of a family of high-dimensional empirical loss functions.<n>The correlations are introduced to model the existence of structure in the data, as is common in current Machine-Learning systems.<n>For completeness, we also treat the case of a loss function used in training Generalized Linear Models in the presence of correlated input data.
- Score: 2.486161976966064
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We use the Kac-Rice formula and results from random matrix theory to obtain the average number of critical points of a family of high-dimensional empirical loss functions, where the data are correlated $d$-dimensional Gaussian vectors, whose number has a fixed ratio with their dimension. The correlations are introduced to model the existence of structure in the data, as is common in current Machine-Learning systems. Under a technical hypothesis, our results are exact in the large-$d$ limit, and characterize the annealed landscape complexity, namely the logarithm of the expected number of critical points at a given value of the loss. We first address in detail the landscape of the loss function of a single perceptron and then generalize it to the case where two competing data sets with different covariance matrices are present, with the perceptron seeking to discriminate between them. The latter model can be applied to understand the interplay between adversity and non-trivial data structure. For completeness, we also treat the case of a loss function used in training Generalized Linear Models in the presence of correlated input data.
Related papers
- An Effective Gram Matrix Characterizes Generalization in Deep Networks [22.314071077213935]
We derive a differential equation that governs the evolution of the generalization gap when a deep network is trained by gradient descent.
We analyze this differential equation to compute an effective Gram matrix'' that characterizes the generalization gap after training.
arXiv Detail & Related papers (2025-04-23T06:24:42Z) - Causal Discovery on Dependent Binary Data [6.464898093190062]
We propose a decorrelation-based approach for causal graph learning on dependent binary data.<n>We develop an EM-like iterative algorithm to generate and decorrelate samples of the latent utility variables.<n>We demonstrate that the proposed decorrelation approach significantly improves the accuracy in causal graph learning.
arXiv Detail & Related papers (2024-12-28T21:55:42Z) - The Underlying Scaling Laws and Universal Statistical Structure of Complex Datasets [2.07180164747172]
We study universal traits which emerge both in real-world complex datasets, as well as in artificially generated ones.
Our approach is to analogize data to a physical system and employ tools from statistical physics and Random Matrix Theory (RMT) to reveal their underlying structure.
arXiv Detail & Related papers (2023-06-26T18:01:47Z) - Towards a mathematical understanding of learning from few examples with
nonlinear feature maps [68.8204255655161]
We consider the problem of data classification where the training set consists of just a few data points.
We reveal key relationships between the geometry of an AI model's feature space, the structure of the underlying data distributions, and the model's generalisation capabilities.
arXiv Detail & Related papers (2022-11-07T14:52:58Z) - Amortized Inference for Causal Structure Learning [72.84105256353801]
Learning causal structure poses a search problem that typically involves evaluating structures using a score or independence test.
We train a variational inference model to predict the causal structure from observational/interventional data.
Our models exhibit robust generalization capabilities under substantial distribution shift.
arXiv Detail & Related papers (2022-05-25T17:37:08Z) - Learning from few examples with nonlinear feature maps [68.8204255655161]
We explore the phenomenon and reveal key relationships between dimensionality of AI model's feature space, non-degeneracy of data distributions, and the model's generalisation capabilities.
The main thrust of our present analysis is on the influence of nonlinear feature transformations mapping original data into higher- and possibly infinite-dimensional spaces on the resulting model's generalisation capabilities.
arXiv Detail & Related papers (2022-03-31T10:36:50Z) - Accounting for Unobserved Confounding in Domain Generalization [107.0464488046289]
This paper investigates the problem of learning robust, generalizable prediction models from a combination of datasets.
Part of the challenge of learning robust models lies in the influence of unobserved confounders.
We demonstrate the empirical performance of our approach on healthcare data from different modalities.
arXiv Detail & Related papers (2020-07-21T08:18:06Z) - Asymptotic Analysis of an Ensemble of Randomly Projected Linear
Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets.
We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator.
We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z) - Semiparametric Nonlinear Bipartite Graph Representation Learning with
Provable Guarantees [106.91654068632882]
We consider the bipartite graph and formalize its representation learning problem as a statistical estimation problem of parameters in a semiparametric exponential family distribution.
We show that the proposed objective is strongly convex in a neighborhood around the ground truth, so that a gradient descent-based method achieves linear convergence rate.
Our estimator is robust to any model misspecification within the exponential family, which is validated in extensive experiments.
arXiv Detail & Related papers (2020-03-02T16:40:36Z) - Generalisation error in learning with random features and the hidden
manifold model [23.71637173968353]
We study generalised linear regression and classification for a synthetically generated dataset.
We consider the high-dimensional regime and using the replica method from statistical physics.
We show how to obtain the so-called double descent behaviour for logistic regression with a peak at the threshold.
We discuss the role played by correlations in the data generated by the hidden manifold model.
arXiv Detail & Related papers (2020-02-21T14:49:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.