Related papers: Rethinking Parameter Counting in Deep Models: Effective Dimensionality Revisited

Rethinking Parameter Counting in Deep Models: Effective Dimensionality Revisited

URL: http://arxiv.org/abs/2003.02139v2
Date: Mon, 25 May 2020 17:43:35 GMT
Title: Rethinking Parameter Counting in Deep Models: Effective Dimensionality Revisited
Authors: Wesley J. Maddox, Gregory Benton, Andrew Gordon Wilson
Abstract summary: We show that neural networks have mysterious generalization properties when using parameter counting as a proxy for complexity. We show that many of these properties become understandable when viewed through the lens of effective dimensionality, which measures the dimensionality of the parameter space determined by the data.
Score: 36.712632126776285
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Neural networks appear to have mysterious generalization properties when using parameter counting as a proxy for complexity. Indeed, neural networks often have many more parameters than there are data points, yet still provide good generalization performance. Moreover, when we measure generalization as a function of parameters, we see double descent behaviour, where the test error decreases, increases, and then again decreases. We show that many of these properties become understandable when viewed through the lens of effective dimensionality, which measures the dimensionality of the parameter space determined by the data. We relate effective dimensionality to posterior contraction in Bayesian deep learning, model selection, width-depth tradeoffs, double descent, and functional diversity in loss surfaces, leading to a richer understanding of the interplay between parameters and functions in deep models. We also show that effective dimensionality compares favourably to alternative norm- and flatness- based generalization measures.

Related papers

Generalized Tensor-based Parameter-Efficient Fine-Tuning via Lie Group Transformations [50.010924231754856]
Adapting pre-trained foundation models for diverse downstream tasks is a core practice in artificial intelligence. To overcome this, parameter-efficient fine-tuning (PEFT) methods like LoRA have emerged and are becoming a growing research focus. We propose a generalization that extends matrix-based PEFT methods to higher-dimensional parameter spaces without compromising their structural properties.
arXiv Detail & Related papers (2025-04-01T14:36:45Z)
Geometry-induced Implicit Regularization in Deep ReLU Neural Networks [0.0]
Implicit regularization phenomena, which are still not well understood, occur during optimization. We study the geometry of the output set as parameters vary. We prove that the batch functional dimension is almost surely determined by the activation patterns in the hidden layers.
arXiv Detail & Related papers (2024-02-13T07:49:57Z)
Learning Low Dimensional State Spaces with Overparameterized Recurrent Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory. Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z)
RENs: Relevance Encoding Networks [0.0]
This paper proposes relevance encoding networks (RENs): a novel probabilistic VAE-based framework that uses the automatic relevance determination (ARD) prior in the latent space to learn the data-specific bottleneck dimensionality. We show that the proposed model learns the relevant latent bottleneck dimensionality without compromising the representation and generation quality of the samples.
arXiv Detail & Related papers (2022-05-25T21:53:48Z)
Post-mortem on a deep learning contest: a Simpson's paradox and the complementary roles of scale metrics versus shape metrics [61.49826776409194]
We analyze a corpus of models made publicly-available for a contest to predict the generalization accuracy of neural network (NN) models. We identify what amounts to a Simpson's paradox: where "scale" metrics perform well overall but perform poorly on sub partitions of the data. We present two novel shape metrics, one data-independent, and the other data-dependent, which can predict trends in the test accuracy of a series of NNs.
arXiv Detail & Related papers (2021-06-01T19:19:49Z)
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning [52.624194343095304]
We argue that analyzing fine-tuning through the lens of intrinsic dimension provides us with empirical and theoretical intuitions. We empirically show that common pre-trained models have a very low intrinsic dimension.
arXiv Detail & Related papers (2020-12-22T07:42:30Z)
Deep Dimension Reduction for Supervised Representation Learning [51.10448064423656]
We propose a deep dimension reduction approach to learning representations with essential characteristics. The proposed approach is a nonparametric generalization of the sufficient dimension reduction method. We show that the estimated deep nonparametric representation is consistent in the sense that its excess risk converges to zero.
arXiv Detail & Related papers (2020-06-10T14:47:43Z)
Beyond Dropout: Feature Map Distortion to Regularize Deep Neural Networks [107.77595511218429]
In this paper, we investigate the empirical Rademacher complexity related to intermediate layers of deep neural networks. We propose a feature distortion method (Disout) for addressing the aforementioned problem. The superiority of the proposed feature map distortion for producing deep neural network with higher testing performance is analyzed and demonstrated.
arXiv Detail & Related papers (2020-02-23T13:59:13Z)
A Geometric Modeling of Occam's Razor in Deep Learning [8.007631014276896]
deep neural networks (DNNs) benefit from very high dimensional parameter spaces. Their huge parameter complexities vs. stunning performances in practice is all the more intriguing and not explainable. We propose a geometrically flavored information-theoretic approach to study this phenomenon.
arXiv Detail & Related papers (2019-05-27T07:57:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.