Related papers: On the Role of Initialization on the Implicit Bias in Deep Linear Networks

On the Role of Initialization on the Implicit Bias in Deep Linear Networks

URL: http://arxiv.org/abs/2402.02454v1
Date: Sun, 4 Feb 2024 11:54:07 GMT
Title: On the Role of Initialization on the Implicit Bias in Deep Linear Networks
Authors: Oria Gruber, Haim Avron
Abstract summary: This study focuses on exploring the phenomenon attributed to the implicit bias at play. Various sources of implicit bias have been identified, such as step size, weight initialization, optimization algorithm, and number of parameters.
Score: 8.272491066698041
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite Deep Learning's (DL) empirical success, our theoretical understanding of its efficacy remains limited. One notable paradox is that while conventional wisdom discourages perfect data fitting, deep neural networks are designed to do just that, yet they generalize effectively. This study focuses on exploring this phenomenon attributed to the implicit bias at play. Various sources of implicit bias have been identified, such as step size, weight initialization, optimization algorithm, and number of parameters. In this work, we focus on investigating the implicit bias originating from weight initialization. To this end, we examine the problem of solving underdetermined linear systems in various contexts, scrutinizing the impact of initialization on the implicit regularization when using deep networks to solve such systems. Our findings elucidate the role of initialization in the optimization and generalization paradoxes, contributing to a more comprehensive understanding of DL's performance characteristics.

Related papers

A Theory of Initialisation's Impact on Specialisation [13.486658531315213]
We show that weight imbalance and high weight entropy can favour specialised solutions. We then show the emergence of a monotonic relation between task-similarity and forgetting in non-specialised networks.
arXiv Detail & Related papers (2025-03-04T11:39:30Z)
Sparsity-aware generalization theory for deep neural networks [12.525959293825318]
We present a new approach to analyzing generalization for deep feed-forward ReLU networks. We show fundamental trade-offs between sparsity and generalization.
arXiv Detail & Related papers (2023-07-01T20:59:05Z)
Uncertainty Estimation by Fisher Information-based Evidential Deep Learning [61.94125052118442]
Uncertainty estimation is a key factor that makes deep learning reliable in practical applications. We propose a novel method, Fisher Information-based Evidential Deep Learning ($mathcalI$-EDL) In particular, we introduce Fisher Information Matrix (FIM) to measure the informativeness of evidence carried by each sample, according to which we can dynamically reweight the objective loss terms to make the network more focused on the representation learning of uncertain classes.
arXiv Detail & Related papers (2023-03-03T16:12:59Z)
Deep networks for system identification: a Survey [56.34005280792013]
System identification learns mathematical descriptions of dynamic systems from input-output data. Main aim of the identified model is to predict new data from previous observations. We discuss architectures commonly adopted in the literature, like feedforward, convolutional, and recurrent networks.
arXiv Detail & Related papers (2023-01-30T12:38:31Z)
On the generalization of learning algorithms that do not converge [54.122745736433856]
Generalization analyses of deep learning typically assume that the training converges to a fixed point. Recent results indicate that in practice, the weights of deep neural networks optimized with gradient descent often oscillate indefinitely.
arXiv Detail & Related papers (2022-08-16T21:22:34Z)
Towards Size-Independent Generalization Bounds for Deep Operator Nets [0.28123958518740544]
This work aims to advance the theory of measuring out-of-sample error while training DeepONets. For a class of DeepONets, we prove a bound on their Rademacher complexity which does not explicitly scale with the width of the nets involved. We show how the Huber loss can be chosen so that for these DeepONet classes generalization error bounds can be obtained that have no explicit dependence on the size of the nets.
arXiv Detail & Related papers (2022-05-23T14:45:34Z)
Convergence Analysis and Implicit Regularization of Feedback Alignment for Deep Linear Networks [27.614609336582568]
We theoretically analyze the Feedback Alignment (FA) algorithm, an efficient alternative to backpropagation for training neural networks. We provide convergence guarantees with rates for deep linear networks for both continuous and discrete dynamics.
arXiv Detail & Related papers (2021-10-20T22:57:03Z)
On the Explicit Role of Initialization on the Convergence and Implicit Bias of Overparametrized Linear Networks [1.0323063834827415]
We present a novel analysis of single-hidden-layer linear networks trained under gradient flow. We show that the squared loss converges exponentially to its optimum. We derive a novel non-asymptotic upper-bound on the distance between the trained network and the min-norm solution.
arXiv Detail & Related papers (2021-05-13T15:13:51Z)
A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation. Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z)
On the Implicit Bias of Initialization Shape: Beyond Infinitesimal Mirror Descent [55.96478231566129]
We show that relative scales play an important role in determining the learned model. We develop a technique for deriving the inductive bias of gradient-flow.
arXiv Detail & Related papers (2021-02-19T07:10:48Z)
Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix. Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z)
Provable Benefit of Orthogonal Initialization in Optimizing Deep Linear Networks [39.856439772974454]
We show that the width needed for efficient convergence to a global minimum is independent of the depth. Our results suggest an explanation for the recent empirical successes found by initializing very deep non-linear networks according to the principle of dynamical isometry.
arXiv Detail & Related papers (2020-01-16T18:48:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.