Rethinking Parameter Counting in Deep Models: Effective Dimensionality
Revisited
- URL: http://arxiv.org/abs/2003.02139v2
- Date: Mon, 25 May 2020 17:43:35 GMT
- Title: Rethinking Parameter Counting in Deep Models: Effective Dimensionality
Revisited
- Authors: Wesley J. Maddox, Gregory Benton, Andrew Gordon Wilson
- Abstract summary: We show that neural networks have mysterious generalization properties when using parameter counting as a proxy for complexity.
We show that many of these properties become understandable when viewed through the lens of effective dimensionality, which measures the dimensionality of the parameter space determined by the data.
- Score: 36.712632126776285
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural networks appear to have mysterious generalization properties when
using parameter counting as a proxy for complexity. Indeed, neural networks
often have many more parameters than there are data points, yet still provide
good generalization performance. Moreover, when we measure generalization as a
function of parameters, we see double descent behaviour, where the test error
decreases, increases, and then again decreases. We show that many of these
properties become understandable when viewed through the lens of effective
dimensionality, which measures the dimensionality of the parameter space
determined by the data. We relate effective dimensionality to posterior
contraction in Bayesian deep learning, model selection, width-depth tradeoffs,
double descent, and functional diversity in loss surfaces, leading to a richer
understanding of the interplay between parameters and functions in deep models.
We also show that effective dimensionality compares favourably to alternative
norm- and flatness- based generalization measures.
Related papers
- Geometry-induced Implicit Regularization in Deep ReLU Neural Networks [0.0]
Implicit regularization phenomena, which are still not well understood, occur during optimization.
We study the geometry of the output set as parameters vary.
We prove that the batch functional dimension is almost surely determined by the activation patterns in the hidden layers.
arXiv Detail & Related papers (2024-02-13T07:49:57Z) - Learning Low Dimensional State Spaces with Overparameterized Recurrent
Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory.
Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z) - RENs: Relevance Encoding Networks [0.0]
This paper proposes relevance encoding networks (RENs): a novel probabilistic VAE-based framework that uses the automatic relevance determination (ARD) prior in the latent space to learn the data-specific bottleneck dimensionality.
We show that the proposed model learns the relevant latent bottleneck dimensionality without compromising the representation and generation quality of the samples.
arXiv Detail & Related papers (2022-05-25T21:53:48Z) - Post-mortem on a deep learning contest: a Simpson's paradox and the
complementary roles of scale metrics versus shape metrics [61.49826776409194]
We analyze a corpus of models made publicly-available for a contest to predict the generalization accuracy of neural network (NN) models.
We identify what amounts to a Simpson's paradox: where "scale" metrics perform well overall but perform poorly on sub partitions of the data.
We present two novel shape metrics, one data-independent, and the other data-dependent, which can predict trends in the test accuracy of a series of NNs.
arXiv Detail & Related papers (2021-06-01T19:19:49Z) - Intrinsic Dimensionality Explains the Effectiveness of Language Model
Fine-Tuning [52.624194343095304]
We argue that analyzing fine-tuning through the lens of intrinsic dimension provides us with empirical and theoretical intuitions.
We empirically show that common pre-trained models have a very low intrinsic dimension.
arXiv Detail & Related papers (2020-12-22T07:42:30Z) - Deep Dimension Reduction for Supervised Representation Learning [51.10448064423656]
We propose a deep dimension reduction approach to learning representations with essential characteristics.
The proposed approach is a nonparametric generalization of the sufficient dimension reduction method.
We show that the estimated deep nonparametric representation is consistent in the sense that its excess risk converges to zero.
arXiv Detail & Related papers (2020-06-10T14:47:43Z) - Beyond Dropout: Feature Map Distortion to Regularize Deep Neural
Networks [107.77595511218429]
In this paper, we investigate the empirical Rademacher complexity related to intermediate layers of deep neural networks.
We propose a feature distortion method (Disout) for addressing the aforementioned problem.
The superiority of the proposed feature map distortion for producing deep neural network with higher testing performance is analyzed and demonstrated.
arXiv Detail & Related papers (2020-02-23T13:59:13Z) - A Geometric Modeling of Occam's Razor in Deep Learning [8.007631014276896]
deep neural networks (DNNs) benefit from very high dimensional parameter spaces.
Their huge parameter complexities vs. stunning performances in practice is all the more intriguing and not explainable.
We propose a geometrically flavored information-theoretic approach to study this phenomenon.
arXiv Detail & Related papers (2019-05-27T07:57:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.