Injectivity of ReLU networks: perspectives from statistical physics
- URL: http://arxiv.org/abs/2302.14112v1
- Date: Mon, 27 Feb 2023 19:51:42 GMT
- Title: Injectivity of ReLU networks: perspectives from statistical physics
- Authors: Antoine Maillard, Afonso S. Bandeira, David Belius, Ivan Dokmani\'c,
Shuta Nakajima
- Abstract summary: We consider a single layer, $x mapsto mathrmReLU(Wx)$, with a random Gaussian $m times n$ matrix $W$, in a high-dimensional setting where $n, m to infty$.
Recent work connects this problem to spherical integral geometry giving rise to a conjectured sharp injectivity threshold for $alpha = fracmn$ by studying the expected Euler characteristic of a certain random set.
We show that injectivity is equivalent to a property of the ground state of the spherical
- Score: 22.357927193774803
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When can the input of a ReLU neural network be inferred from its output? In
other words, when is the network injective? We consider a single layer, $x
\mapsto \mathrm{ReLU}(Wx)$, with a random Gaussian $m \times n$ matrix $W$, in
a high-dimensional setting where $n, m \to \infty$. Recent work connects this
problem to spherical integral geometry giving rise to a conjectured sharp
injectivity threshold for $\alpha = \frac{m}{n}$ by studying the expected Euler
characteristic of a certain random set. We adopt a different perspective and
show that injectivity is equivalent to a property of the ground state of the
spherical perceptron, an important spin glass model in statistical physics. By
leveraging the (non-rigorous) replica symmetry-breaking theory, we derive
analytical equations for the threshold whose solution is at odds with that from
the Euler characteristic. Furthermore, we use Gordon's min--max theorem to
prove that a replica-symmetric upper bound refutes the Euler characteristic
prediction. Along the way we aim to give a tutorial-style introduction to key
ideas from statistical physics in an effort to make the exposition accessible
to a broad audience. Our analysis establishes a connection between spin glasses
and integral geometry but leaves open the problem of explaining the
discrepancies.
Related papers
- Symmetries, correlation functions, and entanglement of general quantum Motzkin spin-chains [0.029541734875307393]
Motzkin spin-chains, which include 'colorless' (integer spin $s=1$) and 'colorful' ($s geq 2$) variants, are one-dimensional (1D) local integer spin models.
We analytically discover several unique properties of these models, potentially suggesting a new correlations class for lowenergy physics.
arXiv Detail & Related papers (2024-08-28T18:10:16Z) - Tempered Calculus for ML: Application to Hyperbolic Model Embedding [70.61101116794549]
Most mathematical distortions used in ML are fundamentally integral in nature.
In this paper, we unveil a grounded theory and tools which can help improve these distortions to better cope with ML requirements.
We show how to apply it to a problem that has recently gained traction in ML: hyperbolic embeddings with a "cheap" and accurate encoding along the hyperbolic vsean scale.
arXiv Detail & Related papers (2024-02-06T17:21:06Z) - Effective Minkowski Dimension of Deep Nonparametric Regression: Function
Approximation and Statistical Theories [70.90012822736988]
Existing theories on deep nonparametric regression have shown that when the input data lie on a low-dimensional manifold, deep neural networks can adapt to intrinsic data structures.
This paper introduces a relaxed assumption that input data are concentrated around a subset of $mathbbRd$ denoted by $mathcalS$, and the intrinsic dimension $mathcalS$ can be characterized by a new complexity notation -- effective Minkowski dimension.
arXiv Detail & Related papers (2023-06-26T17:13:31Z) - One-dimensional pseudoharmonic oscillator: classical remarks and
quantum-information theory [0.0]
Motion of a potential that is a combination of positive quadratic and inverse quadratic functions of the position is considered.
The dependence on the particle energy and the factor $mathfraka$ describing a relative strength of its constituents is described.
arXiv Detail & Related papers (2023-04-13T11:50:51Z) - Near-optimal fitting of ellipsoids to random points [68.12685213894112]
A basic problem of fitting an ellipsoid to random points has connections to low-rank matrix decompositions, independent component analysis, and principal component analysis.
We resolve this conjecture up to logarithmic factors by constructing a fitting ellipsoid for some $n = Omega(, d2/mathrmpolylog(d),)$.
Our proof demonstrates feasibility of the least squares construction of Saunderson et al. using a convenient decomposition of a certain non-standard random matrix.
arXiv Detail & Related papers (2022-08-19T18:00:34Z) - On the Multidimensional Random Subset Sum Problem [0.9007371440329465]
In the Random Subset Sum Problem, given $n$ i.i.d. random variables $X_1,..., X_n$, we wish to approximate any point $z in [-1,1]$ as the sum of a subset $X_i_1(z),..., X_i_s(z)$ of them, up to error $varepsilon cdot.
We prove that, in $d$ dimensions, $n = O(d3log frac 1varepsilon cdot
arXiv Detail & Related papers (2022-07-28T08:10:43Z) - Locality defeats the curse of dimensionality in convolutional
teacher-student scenarios [69.2027612631023]
We show that locality is key in determining the learning curve exponent $beta$.
We conclude by proving, using a natural assumption, that performing kernel regression with a ridge that decreases with the size of the training set leads to similar learning curve exponents to those we obtain in the ridgeless case.
arXiv Detail & Related papers (2021-06-16T08:27:31Z) - Deep neural network approximation of analytic functions [91.3755431537592]
entropy bound for the spaces of neural networks with piecewise linear activation functions.
We derive an oracle inequality for the expected error of the considered penalized deep neural network estimators.
arXiv Detail & Related papers (2021-04-05T18:02:04Z) - Exponential ReLU Neural Network Approximation Rates for Point and Edge
Singularities [0.0]
We prove exponential expressivity with stable ReLU Neural Networks (ReLU NNs) in $H1(Omega)$ for weighted analytic function classes in certain polytopal domains.
The exponential approximation rates are shown to hold in space dimension $d = 2$ on Lipschitz polygons with straight sides, and in space dimension $d=3$ on Fichera-type polyhedral domains with plane faces.
arXiv Detail & Related papers (2020-10-23T07:44:32Z) - Analytic Characterization of the Hessian in Shallow ReLU Models: A Tale
of Symmetry [9.695960412426672]
We analytically characterize the Hessian at various families of spurious minima.
In particular, we prove that for $dge k$ standard Gaussian inputs: (a) of the $dk$ eigenvalues of the Hessian, $dk - O(d)$ concentrate near zero, (b) $Omega(d)$ of the eigenvalues grow linearly with $k$.
arXiv Detail & Related papers (2020-08-04T20:08:35Z) - Continuous-time quantum walks in the presence of a quadratic
perturbation [55.41644538483948]
We address the properties of continuous-time quantum walks with Hamiltonians of the form $mathcalH= L + lambda L2$.
We consider cycle, complete, and star graphs because paradigmatic models with low/high connectivity and/or symmetry.
arXiv Detail & Related papers (2020-05-13T14:53:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.