Affine symmetries and neural network identifiability
- URL: http://arxiv.org/abs/2006.11727v2
- Date: Thu, 22 Oct 2020 11:10:03 GMT
- Title: Affine symmetries and neural network identifiability
- Authors: Verner Vla\v{c}i\'c and Helmut B\"olcskei
- Abstract summary: We consider arbitrary nonlinearities with potentially complicated affine symmetries.
We show that the symmetries can be used to find a rich set of networks giving rise to the same function $f$.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We address the following question of neural network identifiability: Suppose
we are given a function $f:\mathbb{R}^m\to\mathbb{R}^n$ and a nonlinearity
$\rho$. Can we specify the architecture, weights, and biases of all
feed-forward neural networks with respect to $\rho$ giving rise to $f$?
Existing literature on the subject suggests that the answer should be yes,
provided we are only concerned with finding networks that satisfy certain
"genericity conditions". Moreover, the identified networks are mutually related
by symmetries of the nonlinearity. For instance, the $\tanh$ function is odd,
and so flipping the signs of the incoming and outgoing weights of a neuron does
not change the output map of the network. The results known hitherto, however,
apply either to single-layer networks, or to networks satisfying specific
structural assumptions (such as full connectivity), as well as to specific
nonlinearities. In an effort to answer the identifiability question in greater
generality, we consider arbitrary nonlinearities with potentially complicated
affine symmetries, and we show that the symmetries can be used to find a rich
set of networks giving rise to the same function $f$. The set obtained in this
manner is, in fact, exhaustive (i.e., it contains all networks giving rise to
$f$) unless there exists a network $\mathcal{A}$ "with no internal symmetries"
giving rise to the identically zero function. This result can thus be
interpreted as an analog of the rank-nullity theorem for linear operators. We
furthermore exhibit a class of "$\tanh$-type" nonlinearities (including the
tanh function itself) for which such a network $\mathcal{A}$ does not exist,
thereby solving the identifiability question for these nonlinearities in full
generality. Finally, we show that this class contains nonlinearities with
arbitrarily complicated symmetries.
Related papers
- Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit [75.4661041626338]
We study the problem of gradient descent learning of a single-index target function $f_*(boldsymbolx) = textstylesigma_*left(langleboldsymbolx,boldsymbolthetarangleright)$
We prove that a two-layer neural network optimized by an SGD-based algorithm learns $f_*$ with a complexity that is not governed by information exponents.
arXiv Detail & Related papers (2024-06-03T17:56:58Z) - Learning Hierarchical Polynomials with Three-Layer Neural Networks [56.71223169861528]
We study the problem of learning hierarchical functions over the standard Gaussian distribution with three-layer neural networks.
For a large subclass of degree $k$s $p$, a three-layer neural network trained via layerwise gradientp descent on the square loss learns the target $h$ up to vanishing test error.
This work demonstrates the ability of three-layer neural networks to learn complex features and as a result, learn a broad class of hierarchical functions.
arXiv Detail & Related papers (2023-11-23T02:19:32Z) - The Onset of Variance-Limited Behavior for Networks in the Lazy and Rich
Regimes [75.59720049837459]
We study the transition from infinite-width behavior to this variance limited regime as a function of sample size $P$ and network width $N$.
We find that finite-size effects can become relevant for very small datasets on the order of $P* sim sqrtN$ for regression with ReLU networks.
arXiv Detail & Related papers (2022-12-23T04:48:04Z) - Neural Network Approximation of Continuous Functions in High Dimensions
with Applications to Inverse Problems [6.84380898679299]
Current theory predicts that networks should scale exponentially in the dimension of the problem.
We provide a general method for bounding the complexity required for a neural network to approximate a H"older (or uniformly) continuous function.
arXiv Detail & Related papers (2022-08-28T22:44:07Z) - The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z) - An Embedding of ReLU Networks and an Analysis of their Identifiability [5.076419064097734]
This paper introduces an embedding for ReLU neural networks of any depth, $Phi(theta)$, that is invariant to scalings.
We derive some conditions under which a deep ReLU network is indeed locally identifiable from the knowledge of the realization.
arXiv Detail & Related papers (2021-07-20T09:43:31Z) - Geometry of the Loss Landscape in Overparameterized Neural Networks:
Symmetries and Invariances [9.390008801320024]
We show that adding one extra neuron to each is sufficient to connect all previously discrete minima into a single manifold.
We show that the number of symmetry-induced critical subspaces dominates the number of affine subspaces forming the global minima manifold.
arXiv Detail & Related papers (2021-05-25T21:19:07Z) - Deep neural network approximation of analytic functions [91.3755431537592]
entropy bound for the spaces of neural networks with piecewise linear activation functions.
We derive an oracle inequality for the expected error of the considered penalized deep neural network estimators.
arXiv Detail & Related papers (2021-04-05T18:02:04Z) - Nonclosedness of Sets of Neural Networks in Sobolev Spaces [0.0]
We show that realized neural networks are not closed in order-$(m-1)$ Sobolev spaces $Wm-1,p$ for $p in [1,infty]$.
For a real analytic activation function, we show that sets of realized neural networks are not closed in $Wk,p$ for any $k in mathbbN$.
arXiv Detail & Related papers (2020-07-23T00:57:25Z) - Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK [58.5766737343951]
We consider the dynamic of descent for learning a two-layer neural network.
We show that an over-parametrized two-layer neural network can provably learn with gradient loss at most ground with Tangent samples.
arXiv Detail & Related papers (2020-07-09T07:09:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.