Path-metrics, pruning, and generalization
- URL: http://arxiv.org/abs/2405.15006v1
- Date: Thu, 23 May 2024 19:23:09 GMT
- Title: Path-metrics, pruning, and generalization
- Authors: Antoine Gonon, Nicolas Brisebarre, Elisa Riccietti, RĂ©mi Gribonval,
- Abstract summary: This paper proves a new bound on function in terms of the so-called path-metrics of the parameters.
It is the first bound of its kind that is broadly applicable to modern networks such as ResNets, VGGs, U-nets, and many more.
- Score: 13.894485461969772
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Analyzing the behavior of ReLU neural networks often hinges on understanding the relationships between their parameters and the functions they implement. This paper proves a new bound on function distances in terms of the so-called path-metrics of the parameters. Since this bound is intrinsically invariant with respect to the rescaling symmetries of the networks, it sharpens previously known bounds. It is also, to the best of our knowledge, the first bound of its kind that is broadly applicable to modern networks such as ResNets, VGGs, U-nets, and many more. In contexts such as network pruning and quantization, the proposed path-metrics can be efficiently computed using only two forward passes. Besides its intrinsic theoretical interest, the bound yields not only novel theoretical generalization bounds, but also a promising proof of concept for rescaling-invariant pruning.
Related papers
- Metric Convolutions: A Unifying Theory to Adaptive Convolutions [3.481985817302898]
Metric convolutions replace standard convolutions in image processing and deep learning.
They require fewer parameters and provide better generalisation.
Our approach shows competitive performance in standard denoising and classification tasks.
arXiv Detail & Related papers (2024-06-08T08:41:12Z) - Generalization of Scaled Deep ResNets in the Mean-Field Regime [55.77054255101667]
We investigate emphscaled ResNet in the limit of infinitely deep and wide neural networks.
Our results offer new insights into the generalization ability of deep ResNet beyond the lazy training regime.
arXiv Detail & Related papers (2024-03-14T21:48:00Z) - Geometry-induced Implicit Regularization in Deep ReLU Neural Networks [0.0]
Implicit regularization phenomena, which are still not well understood, occur during optimization.
We study the geometry of the output set as parameters vary.
We prove that the batch functional dimension is almost surely determined by the activation patterns in the hidden layers.
arXiv Detail & Related papers (2024-02-13T07:49:57Z) - Adaptive Log-Euclidean Metrics for SPD Matrix Learning [73.12655932115881]
We propose Adaptive Log-Euclidean Metrics (ALEMs), which extend the widely used Log-Euclidean Metric (LEM)
The experimental and theoretical results demonstrate the merit of the proposed metrics in improving the performance of SPD neural networks.
arXiv Detail & Related papers (2023-03-26T18:31:52Z) - A Lifted Bregman Formulation for the Inversion of Deep Neural Networks [28.03724379169264]
We propose a novel framework for the regularised inversion of deep neural networks.
The framework lifts the parameter space into a higher dimensional space by introducing auxiliary variables.
We present theoretical results and support their practical application with numerical examples.
arXiv Detail & Related papers (2023-03-01T20:30:22Z) - Simple initialization and parametrization of sinusoidal networks via
their kernel bandwidth [92.25666446274188]
sinusoidal neural networks with activations have been proposed as an alternative to networks with traditional activation functions.
We first propose a simplified version of such sinusoidal neural networks, which allows both for easier practical implementation and simpler theoretical analysis.
We then analyze the behavior of these networks from the neural tangent kernel perspective and demonstrate that their kernel approximates a low-pass filter with an adjustable bandwidth.
arXiv Detail & Related papers (2022-11-26T07:41:48Z) - Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks.
We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space.
We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z) - The Sample Complexity of One-Hidden-Layer Neural Networks [57.6421258363243]
We study a class of scalar-valued one-hidden-layer networks, and inputs bounded in Euclidean norm.
We prove that controlling the spectral norm of the hidden layer weight matrix is insufficient to get uniform convergence guarantees.
We analyze two important settings where a mere spectral norm control turns out to be sufficient.
arXiv Detail & Related papers (2022-02-13T07:12:02Z) - A Pairwise Connected Tensor Network Representation of Path Integrals [0.0]
It has been recently shown how the tensorial nature of real-time path integrals involving the Feynman-Vernon influence functional can be utilized.
Here, a generalized tensor network is derived and implemented specifically incorporating the pairwise interaction structure of the influence functional.
This pairwise connected tensor network path integral (PCTNPI) is illustrated through applications to typical spin-boson problems and explorations of the differences caused by the exact form of the spectral density.
arXiv Detail & Related papers (2021-06-28T18:30:17Z) - A Convergence Theory Towards Practical Over-parameterized Deep Neural
Networks [56.084798078072396]
We take a step towards closing the gap between theory and practice by significantly improving the known theoretical bounds on both the network width and the convergence time.
We show that convergence to a global minimum is guaranteed for networks with quadratic widths in the sample size and linear in their depth at a time logarithmic in both.
Our analysis and convergence bounds are derived via the construction of a surrogate network with fixed activation patterns that can be transformed at any time to an equivalent ReLU network of a reasonable size.
arXiv Detail & Related papers (2021-01-12T00:40:45Z) - Deep connections between learning from limited labels & physical
parameter estimation -- inspiration for regularization [0.0]
We show that explicit regularization of model parameters in PDE constrained optimization translates to regularization of the network output.
A hyperspectral imaging example shows that minimum prior information together with cross-validation for optimal regularization parameters boosts the segmentation accuracy.
arXiv Detail & Related papers (2020-03-17T19:33:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.