On the Lipschitz Constant of Deep Networks and Double Descent
- URL: http://arxiv.org/abs/2301.12309v4
- Date: Tue, 14 Nov 2023 15:48:48 GMT
- Title: On the Lipschitz Constant of Deep Networks and Double Descent
- Authors: Matteo Gamba, Hossein Azizpour, M{\aa}rten Bj\"orkman
- Abstract summary: Existing bounds on the generalization error of deep networks assume some form of smooth or bounded dependence on the input variable.
We present an extensive experimental study of the empirical Lipschitz constant of deep networks undergoing double descent.
- Score: 5.381801249240512
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Existing bounds on the generalization error of deep networks assume some form
of smooth or bounded dependence on the input variable, falling short of
investigating the mechanisms controlling such factors in practice. In this
work, we present an extensive experimental study of the empirical Lipschitz
constant of deep networks undergoing double descent, and highlight
non-monotonic trends strongly correlating with the test error. Building a
connection between parameter-space and input-space gradients for SGD around a
critical point, we isolate two important factors -- namely loss landscape
curvature and distance of parameters from initialization -- respectively
controlling optimization dynamics around a critical point and bounding model
function complexity, even beyond the training data. Our study presents novels
insights on implicit regularization via overparameterization, and effective
model complexity for networks trained in practice.
Related papers
- Neural Network Approximation for Pessimistic Offline Reinforcement
Learning [17.756108291816908]
We present a non-asymptotic estimation error of pessimistic offline RL using general neural network approximation.
Our result shows that the estimation error consists of two parts: the first converges to zero at a desired rate on the sample size with partially controllable concentrability, and the second becomes negligible if the residual constraint is tight.
arXiv Detail & Related papers (2023-12-19T05:17:27Z) - Leveraging Low-Rank and Sparse Recurrent Connectivity for Robust
Closed-Loop Control [63.310780486820796]
We show how a parameterization of recurrent connectivity influences robustness in closed-loop settings.
We find that closed-form continuous-time neural networks (CfCs) with fewer parameters can outperform their full-rank, fully-connected counterparts.
arXiv Detail & Related papers (2023-10-05T21:44:18Z) - On the ISS Property of the Gradient Flow for Single Hidden-Layer Neural
Networks with Linear Activations [0.0]
We investigate the effects of overfitting on the robustness of gradient-descent training when subject to uncertainty on the gradient estimation.
We show that the general overparametrized formulation introduces a set of spurious equilibria which lay outside the set where the loss function is minimized.
arXiv Detail & Related papers (2023-05-17T02:26:34Z) - Conductivity Imaging from Internal Measurements with Mixed Least-Squares
Deep Neural Networks [4.228167013618626]
We develop a novel approach using deep neural networks to reconstruct the conductivity distribution in elliptic problems.
We provide a thorough analysis of the deep neural network approximations of the conductivity for both continuous and empirical losses.
arXiv Detail & Related papers (2023-03-29T04:43:03Z) - Phenomenology of Double Descent in Finite-Width Neural Networks [29.119232922018732]
Double descent delineates the behaviour of models depending on the regime they belong to.
We use influence functions to derive suitable expressions of the population loss and its lower bound.
Building on our analysis, we investigate how the loss function affects double descent.
arXiv Detail & Related papers (2022-03-14T17:39:49Z) - Multi-scale Feature Learning Dynamics: Insights for Double Descent [71.91871020059857]
We study the phenomenon of "double descent" of the generalization error.
We find that double descent can be attributed to distinct features being learned at different scales.
arXiv Detail & Related papers (2021-12-06T18:17:08Z) - Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks [83.58049517083138]
We consider a two-layer ReLU network trained via gradient descent.
We show that SGD is biased towards a simple solution.
We also provide empirical evidence that knots at locations distinct from the data points might occur.
arXiv Detail & Related papers (2021-11-03T15:14:20Z) - The Interplay Between Implicit Bias and Benign Overfitting in Two-Layer
Linear Networks [51.1848572349154]
neural network models that perfectly fit noisy data can generalize well to unseen test data.
We consider interpolating two-layer linear neural networks trained with gradient flow on the squared loss and derive bounds on the excess risk.
arXiv Detail & Related papers (2021-08-25T22:01:01Z) - What training reveals about neural network complexity [80.87515604428346]
This work explores the hypothesis that the complexity of the function a deep neural network (NN) is learning can be deduced by how fast its weights change during training.
Our results support the hypothesis that good training behavior can be a useful bias towards good generalization.
arXiv Detail & Related papers (2021-06-08T08:58:00Z) - DessiLBI: Exploring Structural Sparsity of Deep Networks via
Differential Inclusion Paths [45.947140164621096]
We propose a new approach based on differential inclusions of inverse scale spaces.
We show that DessiLBI unveils "winning tickets" in early epochs.
arXiv Detail & Related papers (2020-07-04T04:40:16Z) - Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms.
We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.