An Embedding of ReLU Networks and an Analysis of their Identifiability
- URL: http://arxiv.org/abs/2107.09370v1
- Date: Tue, 20 Jul 2021 09:43:31 GMT
- Title: An Embedding of ReLU Networks and an Analysis of their Identifiability
- Authors: Pierre Stock and R\'emi Gribonval
- Abstract summary: This paper introduces an embedding for ReLU neural networks of any depth, $Phi(theta)$, that is invariant to scalings.
We derive some conditions under which a deep ReLU network is indeed locally identifiable from the knowledge of the realization.
- Score: 5.076419064097734
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural networks with the Rectified Linear Unit (ReLU) nonlinearity are
described by a vector of parameters $\theta$, and realized as a piecewise
linear continuous function $R_{\theta}: x \in \mathbb R^{d} \mapsto
R_{\theta}(x) \in \mathbb R^{k}$. Natural scalings and permutations operations
on the parameters $\theta$ leave the realization unchanged, leading to
equivalence classes of parameters that yield the same realization. These
considerations in turn lead to the notion of identifiability -- the ability to
recover (the equivalence class of) $\theta$ from the sole knowledge of its
realization $R_{\theta}$. The overall objective of this paper is to introduce
an embedding for ReLU neural networks of any depth, $\Phi(\theta)$, that is
invariant to scalings and that provides a locally linear parameterization of
the realization of the network. Leveraging these two key properties, we derive
some conditions under which a deep ReLU network is indeed locally identifiable
from the knowledge of the realization on a finite set of samples $x_{i} \in
\mathbb R^{d}$. We study the shallow case in more depth, establishing necessary
and sufficient conditions for the network to be identifiable from a bounded
subset $\mathcal X \subseteq \mathbb R^{d}$.
Related papers
- New advances in universal approximation with neural networks of minimal width [4.424170214926035]
We show that autoencoders with leaky ReLU activations are universal approximators of $Lp$ functions.
We broaden our results to show that smooth invertible neural networks can approximate $Lp(mathbbRd,mathbbRd)$ on compacta.
arXiv Detail & Related papers (2024-11-13T16:17:16Z) - Implicit Hypersurface Approximation Capacity in Deep ReLU Networks [0.0]
We develop a geometric approximation theory for deep feed-forward neural networks with ReLU activations.
We show that a deep fully-connected ReLU network of width $d+1$ can implicitly construct an approximation as its zero contour.
arXiv Detail & Related papers (2024-07-04T11:34:42Z) - Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit [75.4661041626338]
We study the problem of gradient descent learning of a single-index target function $f_*(boldsymbolx) = textstylesigma_*left(langleboldsymbolx,boldsymbolthetarangleright)$
We prove that a two-layer neural network optimized by an SGD-based algorithm learns $f_*$ with a complexity that is not governed by information exponents.
arXiv Detail & Related papers (2024-06-03T17:56:58Z) - Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - Deep neural network approximation of analytic functions [91.3755431537592]
entropy bound for the spaces of neural networks with piecewise linear activation functions.
We derive an oracle inequality for the expected error of the considered penalized deep neural network estimators.
arXiv Detail & Related papers (2021-04-05T18:02:04Z) - Function approximation by deep neural networks with parameters $\{0,\pm
\frac{1}{2}, \pm 1, 2\}$ [91.3755431537592]
It is shown that $C_beta$-smooth functions can be approximated by neural networks with parameters $0,pm frac12, pm 1, 2$.
The depth, width and the number of active parameters of constructed networks have, up to a logarithimc factor, the same dependence on the approximation error as the networks with parameters in $[-1,1]$.
arXiv Detail & Related papers (2021-03-15T19:10:02Z) - Affine symmetries and neural network identifiability [0.0]
We consider arbitrary nonlinearities with potentially complicated affine symmetries.
We show that the symmetries can be used to find a rich set of networks giving rise to the same function $f$.
arXiv Detail & Related papers (2020-06-21T07:09:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.