How You Start Matters for Generalization
- URL: http://arxiv.org/abs/2206.08558v1
- Date: Fri, 17 Jun 2022 05:30:56 GMT
- Title: How You Start Matters for Generalization
- Authors: Sameera Ramasinghe, Lachlan MacDonald, Moshiur Farazi, Hemanth
Sartachandran, Simon Lucey
- Abstract summary: We show that the generalization of neural networks is heavily tied to their initializes.
We make a case against the controversial flat-minima conjecture.
- Score: 26.74340246715699
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Characterizing the remarkable generalization properties of over-parameterized
neural networks remains an open problem. In this paper, we promote a shift of
focus towards initialization rather than neural architecture or (stochastic)
gradient descent to explain this implicit regularization. Through a Fourier
lens, we derive a general result for the spectral bias of neural networks and
show that the generalization of neural networks is heavily tied to their
initialization. Further, we empirically solidify the developed theoretical
insights using practical, deep networks. Finally, we make a case against the
controversial flat-minima conjecture and show that Fourier analysis grants a
more reliable framework for understanding the generalization of neural
networks.
Related papers
- Feature Contamination: Neural Networks Learn Uncorrelated Features and Fail to Generalize [5.642322814965062]
Learning representations that generalize under distribution shifts is critical for building robust machine learning models.
We show that even allowing a neural network to explicitly fit the representations obtained from a teacher network that can generalize out-of-distribution is insufficient for the generalization of the student network.
arXiv Detail & Related papers (2024-06-05T15:04:27Z) - Efficient Learning Using Spiking Neural Networks Equipped With Affine Encoders and Decoders [2.255961793913651]
We study the learning problem associated with spiking neural networks.
We consider hypothesis sets of spiking neural networks with affine temporal encoders and decoders and simple spiking neurons having only positive synaptic weights.
arXiv Detail & Related papers (2024-04-06T08:17:07Z) - A Scalable Walsh-Hadamard Regularizer to Overcome the Low-degree
Spectral Bias of Neural Networks [79.28094304325116]
Despite the capacity of neural nets to learn arbitrary functions, models trained through gradient descent often exhibit a bias towards simpler'' functions.
We show how this spectral bias towards low-degree frequencies can in fact hurt the neural network's generalization on real-world datasets.
We propose a new scalable functional regularization scheme that aids the neural network to learn higher degree frequencies.
arXiv Detail & Related papers (2023-05-16T20:06:01Z) - On the generalization of learning algorithms that do not converge [54.122745736433856]
Generalization analyses of deep learning typically assume that the training converges to a fixed point.
Recent results indicate that in practice, the weights of deep neural networks optimized with gradient descent often oscillate indefinitely.
arXiv Detail & Related papers (2022-08-16T21:22:34Z) - Rank Diminishing in Deep Neural Networks [71.03777954670323]
Rank of neural networks measures information flowing across layers.
It is an instance of a key structural condition that applies across broad domains of machine learning.
For neural networks, however, the intrinsic mechanism that yields low-rank structures remains vague and unclear.
arXiv Detail & Related papers (2022-06-13T12:03:32Z) - Learning Theory Can (Sometimes) Explain Generalisation in Graph Neural
Networks [13.518582483147325]
We provide a rigorous analysis of the performance of neural networks in the context of transductive inference.
We show that transductive Rademacher complexity can explain the generalisation properties of graph convolutional networks for block models.
arXiv Detail & Related papers (2021-12-07T20:06:23Z) - Formalizing Generalization and Robustness of Neural Networks to Weight
Perturbations [58.731070632586594]
We provide the first formal analysis for feed-forward neural networks with non-negative monotone activation functions against weight perturbations.
We also design a new theory-driven loss function for training generalizable and robust neural networks against weight perturbations.
arXiv Detail & Related papers (2021-03-03T06:17:03Z) - Generalization bound of globally optimal non-convex neural network
training: Transportation map estimation by infinite dimensional Langevin
dynamics [50.83356836818667]
We introduce a new theoretical framework to analyze deep learning optimization with connection to its generalization error.
Existing frameworks such as mean field theory and neural tangent kernel theory for neural network optimization analysis typically require taking limit of infinite width of the network to show its global convergence.
arXiv Detail & Related papers (2020-07-11T18:19:50Z) - Expressivity of Deep Neural Networks [2.7909470193274593]
In this review paper, we give a comprehensive overview of the large variety of approximation results for neural networks.
While the mainbody of existing results is for general feedforward architectures, we also depict approximation results for convolutional, residual and recurrent neural networks.
arXiv Detail & Related papers (2020-07-09T13:08:01Z) - Understanding Generalization in Deep Learning via Tensor Methods [53.808840694241]
We advance the understanding of the relations between the network's architecture and its generalizability from the compression perspective.
We propose a series of intuitive, data-dependent and easily-measurable properties that tightly characterize the compressibility and generalizability of neural networks.
arXiv Detail & Related papers (2020-01-14T22:26:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.