Geometry-induced Implicit Regularization in Deep ReLU Neural Networks
- URL: http://arxiv.org/abs/2402.08269v1
- Date: Tue, 13 Feb 2024 07:49:57 GMT
- Title: Geometry-induced Implicit Regularization in Deep ReLU Neural Networks
- Authors: Joachim Bona-Pellissier (IMT), Fran \c{c}ois Malgouyres (IMT), Fran
\c{c}ois Bachoc (IMT)
- Abstract summary: Implicit regularization phenomena, which are still not well understood, occur during optimization.
We study the geometry of the output set as parameters vary.
We prove that the batch functional dimension is almost surely determined by the activation patterns in the hidden layers.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It is well known that neural networks with many more parameters than training
examples do not overfit. Implicit regularization phenomena, which are still not
well understood, occur during optimization and 'good' networks are favored.
Thus the number of parameters is not an adequate measure of complexity if we do
not consider all possible networks but only the 'good' ones. To better
understand which networks are favored during optimization, we study the
geometry of the output set as parameters vary. When the inputs are fixed, we
prove that the dimension of this set changes and that the local dimension,
called batch functional dimension, is almost surely determined by the
activation patterns in the hidden layers. We prove that the batch functional
dimension is invariant to the symmetries of the network parameterization:
neuron permutations and positive rescalings. Empirically, we establish that the
batch functional dimension decreases during optimization. As a consequence,
optimization leads to parameters with low batch functional dimensions. We call
this phenomenon geometry-induced implicit regularization.The batch functional
dimension depends on both the network parameters and inputs. To understand the
impact of the inputs, we study, for fixed parameters, the largest attainable
batch functional dimension when the inputs vary. We prove that this quantity,
called computable full functional dimension, is also invariant to the
symmetries of the network's parameterization, and is determined by the
achievable activation patterns. We also provide a sampling theorem, showing a
fast convergence of the estimation of the computable full functional dimension
for a random input of increasing size. Empirically we find that the computable
full functional dimension remains close to the number of parameters, which is
related to the notion of local identifiability. This differs from the observed
values for the batch functional dimension computed on training inputs and test
inputs. The latter are influenced by geometry-induced implicit regularization.
Related papers
- The Empirical Impact of Neural Parameter Symmetries, or Lack Thereof [50.49582712378289]
We investigate the impact of neural parameter symmetries by introducing new neural network architectures.
We develop two methods, with some provable guarantees, of modifying standard neural networks to reduce parameter space symmetries.
Our experiments reveal several interesting observations on the empirical impact of parameter symmetries.
arXiv Detail & Related papers (2024-05-30T16:32:31Z) - Nonparametric Classification on Low Dimensional Manifolds using
Overparameterized Convolutional Residual Networks [82.03459331544737]
We study the performance of ConvResNeXts, trained with weight decay from the perspective of nonparametric classification.
Our analysis allows for infinitely many building blocks in ConvResNeXts, and shows that weight decay implicitly enforces sparsity on these blocks.
arXiv Detail & Related papers (2023-07-04T11:08:03Z) - The Implicit Bias of Minima Stability in Multivariate Shallow ReLU
Networks [53.95175206863992]
We study the type of solutions to which gradient descent converges when used to train a single hidden-layer multivariate ReLU network with the quadratic loss.
We prove that although shallow ReLU networks are universal approximators, stable shallow networks are not.
arXiv Detail & Related papers (2023-06-30T09:17:39Z) - Any-dimensional equivariant neural networks [1.4469725791865984]
Traditional supervised learning aims to learn an unknown mapping by fitting a function to a set of input-output pairs with a fixed dimension.
We leverage a newly-discovered phenomenon in algebraic topology, called representation stability, to define equivariant neural networks that can be trained with data in a fixed dimension.
arXiv Detail & Related papers (2023-06-10T00:55:38Z) - Hidden symmetries of ReLU networks [17.332539115959708]
In some networks, the only symmetries are permutation of neurons in a layer and positive scaling of parameters at a neuron, while other networks admit additional hidden symmetries.
In this work, we prove that, for any network architecture where no layer is narrower than the input, there exist parameter settings with no hidden symmetries.
arXiv Detail & Related papers (2023-06-09T18:07:06Z) - Functional dimension of feedforward ReLU neural networks [0.0]
We show that functional dimension is inhomogeneous across the parameter space of ReLU neural network functions.
We also study the quotient space and fibers of the realization map from parameter space to function space.
arXiv Detail & Related papers (2022-09-08T21:30:16Z) - A Functional Perspective on Learning Symmetric Functions with Neural
Networks [48.80300074254758]
We study the learning and representation of neural networks defined on measures.
We establish approximation and generalization bounds under different choices of regularization.
The resulting models can be learned efficiently and enjoy generalization guarantees that extend across input sizes.
arXiv Detail & Related papers (2020-08-16T16:34:33Z) - Rethinking Parameter Counting in Deep Models: Effective Dimensionality
Revisited [36.712632126776285]
We show that neural networks have mysterious generalization properties when using parameter counting as a proxy for complexity.
We show that many of these properties become understandable when viewed through the lens of effective dimensionality, which measures the dimensionality of the parameter space determined by the data.
arXiv Detail & Related papers (2020-03-04T15:39:27Z) - Convex Geometry and Duality of Over-parameterized Neural Networks [70.15611146583068]
We develop a convex analytic approach to analyze finite width two-layer ReLU networks.
We show that an optimal solution to the regularized training problem can be characterized as extreme points of a convex set.
In higher dimensions, we show that the training problem can be cast as a finite dimensional convex problem with infinitely many constraints.
arXiv Detail & Related papers (2020-02-25T23:05:33Z) - Supervised Learning for Non-Sequential Data: A Canonical Polyadic
Decomposition Approach [85.12934750565971]
Efficient modelling of feature interactions underpins supervised learning for non-sequential tasks.
To alleviate this issue, it has been proposed to implicitly represent the model parameters as a tensor.
For enhanced expressiveness, we generalize the framework to allow feature mapping to arbitrarily high-dimensional feature vectors.
arXiv Detail & Related papers (2020-01-27T22:38:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.