On the Role of Initialization on the Implicit Bias in Deep Linear
Networks
- URL: http://arxiv.org/abs/2402.02454v1
- Date: Sun, 4 Feb 2024 11:54:07 GMT
- Title: On the Role of Initialization on the Implicit Bias in Deep Linear
Networks
- Authors: Oria Gruber, Haim Avron
- Abstract summary: This study focuses on exploring the phenomenon attributed to the implicit bias at play.
Various sources of implicit bias have been identified, such as step size, weight initialization, optimization algorithm, and number of parameters.
- Score: 8.272491066698041
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite Deep Learning's (DL) empirical success, our theoretical understanding
of its efficacy remains limited. One notable paradox is that while conventional
wisdom discourages perfect data fitting, deep neural networks are designed to
do just that, yet they generalize effectively. This study focuses on exploring
this phenomenon attributed to the implicit bias at play. Various sources of
implicit bias have been identified, such as step size, weight initialization,
optimization algorithm, and number of parameters. In this work, we focus on
investigating the implicit bias originating from weight initialization. To this
end, we examine the problem of solving underdetermined linear systems in
various contexts, scrutinizing the impact of initialization on the implicit
regularization when using deep networks to solve such systems. Our findings
elucidate the role of initialization in the optimization and generalization
paradoxes, contributing to a more comprehensive understanding of DL's
performance characteristics.
Related papers
- Sparsity-aware generalization theory for deep neural networks [12.525959293825318]
We present a new approach to analyzing generalization for deep feed-forward ReLU networks.
We show fundamental trade-offs between sparsity and generalization.
arXiv Detail & Related papers (2023-07-01T20:59:05Z) - Uncertainty Estimation by Fisher Information-based Evidential Deep
Learning [61.94125052118442]
Uncertainty estimation is a key factor that makes deep learning reliable in practical applications.
We propose a novel method, Fisher Information-based Evidential Deep Learning ($mathcalI$-EDL)
In particular, we introduce Fisher Information Matrix (FIM) to measure the informativeness of evidence carried by each sample, according to which we can dynamically reweight the objective loss terms to make the network more focused on the representation learning of uncertain classes.
arXiv Detail & Related papers (2023-03-03T16:12:59Z) - Deep networks for system identification: a Survey [56.34005280792013]
System identification learns mathematical descriptions of dynamic systems from input-output data.
Main aim of the identified model is to predict new data from previous observations.
We discuss architectures commonly adopted in the literature, like feedforward, convolutional, and recurrent networks.
arXiv Detail & Related papers (2023-01-30T12:38:31Z) - On the generalization of learning algorithms that do not converge [54.122745736433856]
Generalization analyses of deep learning typically assume that the training converges to a fixed point.
Recent results indicate that in practice, the weights of deep neural networks optimized with gradient descent often oscillate indefinitely.
arXiv Detail & Related papers (2022-08-16T21:22:34Z) - Towards Size-Independent Generalization Bounds for Deep Operator Nets [0.28123958518740544]
This work aims to advance the theory of measuring out-of-sample error while training DeepONets.
For a class of DeepONets, we prove a bound on their Rademacher complexity which does not explicitly scale with the width of the nets involved.
We show how the Huber loss can be chosen so that for these DeepONet classes generalization error bounds can be obtained that have no explicit dependence on the size of the nets.
arXiv Detail & Related papers (2022-05-23T14:45:34Z) - Convergence Analysis and Implicit Regularization of Feedback Alignment
for Deep Linear Networks [27.614609336582568]
We theoretically analyze the Feedback Alignment (FA) algorithm, an efficient alternative to backpropagation for training neural networks.
We provide convergence guarantees with rates for deep linear networks for both continuous and discrete dynamics.
arXiv Detail & Related papers (2021-10-20T22:57:03Z) - A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation.
Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z) - On the Implicit Bias of Initialization Shape: Beyond Infinitesimal
Mirror Descent [55.96478231566129]
We show that relative scales play an important role in determining the learned model.
We develop a technique for deriving the inductive bias of gradient-flow.
arXiv Detail & Related papers (2021-02-19T07:10:48Z) - Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix.
Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z) - Provable Benefit of Orthogonal Initialization in Optimizing Deep Linear
Networks [39.856439772974454]
We show that the width needed for efficient convergence to a global minimum is independent of the depth.
Our results suggest an explanation for the recent empirical successes found by initializing very deep non-linear networks according to the principle of dynamical isometry.
arXiv Detail & Related papers (2020-01-16T18:48:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.