With Greater Distance Comes Worse Performance: On the Perspective of
Layer Utilization and Model Generalization
- URL: http://arxiv.org/abs/2201.11939v1
- Date: Fri, 28 Jan 2022 05:26:32 GMT
- Title: With Greater Distance Comes Worse Performance: On the Perspective of
Layer Utilization and Model Generalization
- Authors: James Wang, Cheng-Lin Yang
- Abstract summary: Generalization of deep neural networks remains one of the main open problems in machine learning.
Early layers generally learn representations relevant to performance on both training data and testing data.
Deeper layers only minimize training risks and fail to generalize well with testing or mislabeled data.
- Score: 3.6321778403619285
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generalization of deep neural networks remains one of the main open problems
in machine learning. Previous theoretical works focused on deriving tight
bounds of model complexity, while empirical works revealed that neural networks
exhibit double descent with respect to both training sample counts and the
neural network size. In this paper, we empirically examined how different
layers of neural networks contribute differently to the model; we found that
early layers generally learn representations relevant to performance on both
training data and testing data. Contrarily, deeper layers only minimize
training risks and fail to generalize well with testing or mislabeled data. We
further illustrate the distance of trained weights to its initial value of
final layers has high correlation to generalization errors and can serve as an
indicator of an overfit of model. Moreover, we show evidence to support
post-training regularization by re-initializing weights of final layers. Our
findings provide an efficient method to estimate the generalization capability
of neural networks, and the insight of those quantitative results may inspire
derivation to better generalization bounds that take the internal structure of
neural networks into consideration.
Related papers
- Towards Scalable and Versatile Weight Space Learning [51.78426981947659]
This paper introduces the SANE approach to weight-space learning.
Our method extends the idea of hyper-representations towards sequential processing of subsets of neural network weights.
arXiv Detail & Related papers (2024-06-14T13:12:07Z) - Fundamental limits of overparametrized shallow neural networks for
supervised learning [11.136777922498355]
We study a two-layer neural network trained from input-output pairs generated by a teacher network with matching architecture.
Our results come in the form of bounds relating i) the mutual information between training data and network weights, or ii) the Bayes-optimal generalization error.
arXiv Detail & Related papers (2023-07-11T08:30:50Z) - Sparsity-aware generalization theory for deep neural networks [12.525959293825318]
We present a new approach to analyzing generalization for deep feed-forward ReLU networks.
We show fundamental trade-offs between sparsity and generalization.
arXiv Detail & Related papers (2023-07-01T20:59:05Z) - Generalization and Estimation Error Bounds for Model-based Neural
Networks [78.88759757988761]
We show that the generalization abilities of model-based networks for sparse recovery outperform those of regular ReLU networks.
We derive practical design rules that allow to construct model-based networks with guaranteed high generalization.
arXiv Detail & Related papers (2023-04-19T16:39:44Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity
on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function.
We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z) - Learning and Generalization in Overparameterized Normalizing Flows [13.074242275886977]
Normalizing flows (NFs) constitute an important class of models in unsupervised learning.
We provide theoretical and empirical evidence that for a class of NFs containing most of the existing NF models, overparametrization hurts training.
We prove that unconstrained NFs can efficiently learn any reasonable data distribution under minimal assumptions when the underlying network is overparametrized.
arXiv Detail & Related papers (2021-06-19T17:11:42Z) - What can linearized neural networks actually say about generalization? [67.83999394554621]
In certain infinitely-wide neural networks, the neural tangent kernel (NTK) theory fully characterizes generalization.
We show that the linear approximations can indeed rank the learning complexity of certain tasks for neural networks.
Our work provides concrete examples of novel deep learning phenomena which can inspire future theoretical research.
arXiv Detail & Related papers (2021-06-12T13:05:11Z) - Persistent Homology Captures the Generalization of Neural Networks
Without A Validation Set [0.0]
We suggest studying the training of neural networks with Algebraic Topology, specifically Persistent Homology.
Using simplicial complex representations of neural networks, we study the PH diagram distance evolution on the neural network learning process.
Results show that the PH diagram distance between consecutive neural network states correlates with the validation accuracy.
arXiv Detail & Related papers (2021-05-31T09:17:31Z) - Understanding Generalization in Deep Learning via Tensor Methods [53.808840694241]
We advance the understanding of the relations between the network's architecture and its generalizability from the compression perspective.
We propose a series of intuitive, data-dependent and easily-measurable properties that tightly characterize the compressibility and generalizability of neural networks.
arXiv Detail & Related papers (2020-01-14T22:26:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.