Related papers: Non-identifiability distinguishes Neural Networks among Parametric Models

Non-identifiability distinguishes Neural Networks among Parametric Models

URL: http://arxiv.org/abs/2504.18017v1
Date: Fri, 25 Apr 2025 01:59:02 GMT
Title: Non-identifiability distinguishes Neural Networks among Parametric Models
Authors: Sourav Chatterjee, Timothy Sudijono,
Abstract summary: We prove a pair of results which distinguish feedforward neural networks among parametric models at the population level.<n>Our results suggest that a lack of identifiability distinguishes neural networks among the class of smooth parametric models.
Score: 5.678271181959529
License: http://creativecommons.org/licenses/by/4.0/
Abstract: One of the enduring problems surrounding neural networks is to identify the factors that differentiate them from traditional statistical models. We prove a pair of results which distinguish feedforward neural networks among parametric models at the population level, for regression tasks. Firstly, we prove that for any pair of random variables $(X,Y)$, neural networks always learn a nontrivial relationship between $X$ and $Y$, if one exists. Secondly, we prove that for reasonable smooth parametric models, under local and global identifiability conditions, there exists a nontrivial $(X,Y)$ pair for which the parametric model learns the constant predictor $\mathbb{E}[Y]$. Together, our results suggest that a lack of identifiability distinguishes neural networks among the class of smooth parametric models.

Related papers

Distribution learning via neural differential equations: a nonparametric statistical perspective [1.4436965372953483]
This work establishes the first general statistical convergence analysis for distribution learning via ODE models trained through likelihood transformations. We show that the latter can be quantified via the $C1$-metric entropy of the class $mathcal F$. We then apply this general framework to the setting of $Ck$-smooth target densities, and establish nearly minimax-optimal convergence rates for two relevant velocity field classes $mathcal F$: $Ck$ functions and neural networks.
arXiv Detail & Related papers (2023-09-03T00:21:37Z)
Provable Identifiability of Two-Layer ReLU Neural Networks via LASSO Regularization [15.517787031620864]
The territory of LASSO is extended to two-layer ReLU neural networks, a fashionable and powerful nonlinear regression model. We show that the LASSO estimator can stably reconstruct the neural network and identify $mathcalSstar$ when the number of samples scales logarithmically. Our theory lies in an extended Restricted Isometry Property (RIP)-based analysis framework for two-layer ReLU neural networks.
arXiv Detail & Related papers (2023-05-07T13:05:09Z)
The Contextual Lasso: Sparse Linear Models via Deep Neural Networks [5.607237982617641]
We develop a new statistical estimator that fits a sparse linear model to the explanatory features such that the sparsity pattern and coefficients vary as a function of the contextual features. An extensive suite of experiments on real and synthetic data suggests that the learned models, which remain highly transparent, can be sparser than the regular lasso.
arXiv Detail & Related papers (2023-02-02T05:00:29Z)
On the Identifiability and Estimation of Causal Location-Scale Noise Models [122.65417012597754]
We study the class of location-scale or heteroscedastic noise models (LSNMs) We show the causal direction is identifiable up to some pathological cases. We propose two estimators for LSNMs: an estimator based on (non-linear) feature maps, and one based on neural networks.
arXiv Detail & Related papers (2022-10-13T17:18:59Z)
Learning to Learn with Generative Models of Neural Network Checkpoints [71.06722933442956]
We construct a dataset of neural network checkpoints and train a generative model on the parameters. We find that our approach successfully generates parameters for a wide range of loss prompts. We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.
arXiv Detail & Related papers (2022-09-26T17:59:58Z)
Inverting brain grey matter models with likelihood-free inference: a tool for trustable cytoarchitecture measurements [62.997667081978825]
characterisation of the brain grey matter cytoarchitecture with quantitative sensitivity to soma density and volume remains an unsolved challenge in dMRI. We propose a new forward model, specifically a new system of equations, requiring a few relatively sparse b-shells. We then apply modern tools from Bayesian analysis known as likelihood-free inference (LFI) to invert our proposed model.
arXiv Detail & Related papers (2021-11-15T09:08:27Z)
The Rate of Convergence of Variation-Constrained Deep Neural Networks [35.393855471751756]
We show that a class of variation-constrained neural networks can achieve near-parametric rate $n-1/2+delta$ for an arbitrarily small constant $delta$. The result indicates that the neural function space needed for approximating smooth functions may not be as large as what is often perceived.
arXiv Detail & Related papers (2021-06-22T21:28:00Z)
Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss. We examine how these benign overfitting phenomena occur in a two-layer neural network setting. We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z)
Fundamental tradeoffs between memorization and robustness in random features and neural tangent regimes [15.76663241036412]
We prove for a large class of activation functions that, if the model memorizes even a fraction of the training, then its Sobolev-seminorm is lower-bounded. Experiments reveal for the first time, (iv) a multiple-descent phenomenon in the robustness of the min-norm interpolator.
arXiv Detail & Related papers (2021-06-04T17:52:50Z)
A Bayesian Perspective on Training Speed and Model Selection [51.15664724311443]
We show that a measure of a model's training speed can be used to estimate its marginal likelihood. We verify our results in model selection tasks for linear models and for the infinite-width limit of deep neural networks. Our results suggest a promising new direction towards explaining why neural networks trained with gradient descent are biased towards functions that generalize well.
arXiv Detail & Related papers (2020-10-27T17:56:14Z)
Provably Efficient Neural Estimation of Structural Equation Model: An Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs) We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent. For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.