Related papers: Geometry-induced Implicit Regularization in Deep ReLU Neural Networks

Geometry-induced Implicit Regularization in Deep ReLU Neural Networks

URL: http://arxiv.org/abs/2402.08269v1
Date: Tue, 13 Feb 2024 07:49:57 GMT
Title: Geometry-induced Implicit Regularization in Deep ReLU Neural Networks
Authors: Joachim Bona-Pellissier (IMT), Fran \c{c}ois Malgouyres (IMT), Fran \c{c}ois Bachoc (IMT)
Abstract summary: Implicit regularization phenomena, which are still not well understood, occur during optimization. We study the geometry of the output set as parameters vary. We prove that the batch functional dimension is almost surely determined by the activation patterns in the hidden layers.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: It is well known that neural networks with many more parameters than training examples do not overfit. Implicit regularization phenomena, which are still not well understood, occur during optimization and 'good' networks are favored. Thus the number of parameters is not an adequate measure of complexity if we do not consider all possible networks but only the 'good' ones. To better understand which networks are favored during optimization, we study the geometry of the output set as parameters vary. When the inputs are fixed, we prove that the dimension of this set changes and that the local dimension, called batch functional dimension, is almost surely determined by the activation patterns in the hidden layers. We prove that the batch functional dimension is invariant to the symmetries of the network parameterization: neuron permutations and positive rescalings. Empirically, we establish that the batch functional dimension decreases during optimization. As a consequence, optimization leads to parameters with low batch functional dimensions. We call this phenomenon geometry-induced implicit regularization.The batch functional dimension depends on both the network parameters and inputs. To understand the impact of the inputs, we study, for fixed parameters, the largest attainable batch functional dimension when the inputs vary. We prove that this quantity, called computable full functional dimension, is also invariant to the symmetries of the network's parameterization, and is determined by the achievable activation patterns. We also provide a sampling theorem, showing a fast convergence of the estimation of the computable full functional dimension for a random input of increasing size. Empirically we find that the computable full functional dimension remains close to the number of parameters, which is related to the notion of local identifiability. This differs from the observed values for the batch functional dimension computed on training inputs and test inputs. The latter are influenced by geometry-induced implicit regularization.

Related papers

Continual Adaptation: Environment-Conditional Parameter Generation for Object Detection in Dynamic Scenarios [54.58186816693791]
environments constantly change over time and space, posing significant challenges for object detectors trained based on a closed-set assumption.<n>We propose a new mechanism, converting the fine-tuning process to a specific- parameter generation.<n>In particular, we first design a dual-path LoRA-based domain-aware adapter that disentangles features into domain-invariant and domain-specific components.
arXiv Detail & Related papers (2025-06-30T17:14:12Z)
Generalized Tensor-based Parameter-Efficient Fine-Tuning via Lie Group Transformations [50.010924231754856]
Adapting pre-trained foundation models for diverse downstream tasks is a core practice in artificial intelligence. To overcome this, parameter-efficient fine-tuning (PEFT) methods like LoRA have emerged and are becoming a growing research focus. We propose a generalization that extends matrix-based PEFT methods to higher-dimensional parameter spaces without compromising their structural properties.
arXiv Detail & Related papers (2025-04-01T14:36:45Z)
The Empirical Impact of Neural Parameter Symmetries, or Lack Thereof [50.49582712378289]
We investigate the impact of neural parameter symmetries by introducing new neural network architectures. We develop two methods, with some provable guarantees, of modifying standard neural networks to reduce parameter space symmetries. Our experiments reveal several interesting observations on the empirical impact of parameter symmetries.
arXiv Detail & Related papers (2024-05-30T16:32:31Z)
Nonparametric Classification on Low Dimensional Manifolds using Overparameterized Convolutional Residual Networks [82.03459331544737]
We study the performance of ConvResNeXts, trained with weight decay from the perspective of nonparametric classification. Our analysis allows for infinitely many building blocks in ConvResNeXts, and shows that weight decay implicitly enforces sparsity on these blocks.
arXiv Detail & Related papers (2023-07-04T11:08:03Z)
The Implicit Bias of Minima Stability in Multivariate Shallow ReLU Networks [53.95175206863992]
We study the type of solutions to which gradient descent converges when used to train a single hidden-layer multivariate ReLU network with the quadratic loss. We prove that although shallow ReLU networks are universal approximators, stable shallow networks are not.
arXiv Detail & Related papers (2023-06-30T09:17:39Z)
Any-dimensional equivariant neural networks [1.4469725791865984]
Traditional supervised learning aims to learn an unknown mapping by fitting a function to a set of input-output pairs with a fixed dimension. We leverage a newly-discovered phenomenon in algebraic topology, called representation stability, to define equivariant neural networks that can be trained with data in a fixed dimension.
arXiv Detail & Related papers (2023-06-10T00:55:38Z)
Hidden symmetries of ReLU networks [17.332539115959708]
In some networks, the only symmetries are permutation of neurons in a layer and positive scaling of parameters at a neuron, while other networks admit additional hidden symmetries. In this work, we prove that, for any network architecture where no layer is narrower than the input, there exist parameter settings with no hidden symmetries.
arXiv Detail & Related papers (2023-06-09T18:07:06Z)
Functional dimension of feedforward ReLU neural networks [0.0]
We show that functional dimension is inhomogeneous across the parameter space of ReLU neural network functions. We also study the quotient space and fibers of the realization map from parameter space to function space.
arXiv Detail & Related papers (2022-09-08T21:30:16Z)
A Functional Perspective on Learning Symmetric Functions with Neural Networks [48.80300074254758]
We study the learning and representation of neural networks defined on measures. We establish approximation and generalization bounds under different choices of regularization. The resulting models can be learned efficiently and enjoy generalization guarantees that extend across input sizes.
arXiv Detail & Related papers (2020-08-16T16:34:33Z)
Rethinking Parameter Counting in Deep Models: Effective Dimensionality Revisited [36.712632126776285]
We show that neural networks have mysterious generalization properties when using parameter counting as a proxy for complexity. We show that many of these properties become understandable when viewed through the lens of effective dimensionality, which measures the dimensionality of the parameter space determined by the data.
arXiv Detail & Related papers (2020-03-04T15:39:27Z)
Convex Geometry and Duality of Over-parameterized Neural Networks [70.15611146583068]
We develop a convex analytic approach to analyze finite width two-layer ReLU networks. We show that an optimal solution to the regularized training problem can be characterized as extreme points of a convex set. In higher dimensions, we show that the training problem can be cast as a finite dimensional convex problem with infinitely many constraints.
arXiv Detail & Related papers (2020-02-25T23:05:33Z)
Supervised Learning for Non-Sequential Data: A Canonical Polyadic Decomposition Approach [85.12934750565971]
Efficient modelling of feature interactions underpins supervised learning for non-sequential tasks. To alleviate this issue, it has been proposed to implicitly represent the model parameters as a tensor. For enhanced expressiveness, we generalize the framework to allow feature mapping to arbitrarily high-dimensional feature vectors.
arXiv Detail & Related papers (2020-01-27T22:38:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.