A Geometric Modeling of Occam's Razor in Deep Learning
- URL: http://arxiv.org/abs/1905.11027v7
- Date: Thu, 31 Oct 2024 23:09:08 GMT
- Title: A Geometric Modeling of Occam's Razor in Deep Learning
- Authors: Ke Sun, Frank Nielsen,
- Abstract summary: deep neural networks (DNNs) benefit from very high dimensional parameter spaces.
Their huge parameter complexities vs. stunning performances in practice is all the more intriguing and not explainable.
We propose a geometrically flavored information-theoretic approach to study this phenomenon.
- Score: 8.007631014276896
- License:
- Abstract: Why do deep neural networks (DNNs) benefit from very high dimensional parameter spaces? Their huge parameter complexities vs. stunning performances in practice is all the more intriguing and not explainable using the standard theory of model selection for regular models. In this work, we propose a geometrically flavored information-theoretic approach to study this phenomenon. Namely, we introduce the locally varying dimensionality of the parameter space of neural network models by considering the number of significant dimensions of the Fisher information matrix, and model the parameter space as a manifold using the framework of singular semi-Riemannian geometry. We derive model complexity measures which yield short description lengths for deep neural network models based on their singularity analysis thus explaining the good performance of DNNs despite their large number of parameters.
Related papers
- Gaussian Process Neural Additive Models [3.7969209746164325]
We propose a new subclass of Neural Additive Models (NAMs) that use a single-layer neural network construction of the Gaussian process via random Fourier features.
GP-NAMs have the advantage of a convex objective function and number of trainable parameters that grows linearly with feature dimensionality.
We show that GP-NAM achieves comparable or better performance in both classification and regression tasks with a large reduction in the number of parameters.
arXiv Detail & Related papers (2024-02-19T20:29:34Z) - Automatic Parameterization for Aerodynamic Shape Optimization via Deep
Geometric Learning [60.69217130006758]
We propose two deep learning models that fully automate shape parameterization for aerodynamic shape optimization.
Both models are optimized to parameterize via deep geometric learning to embed human prior knowledge into learned geometric patterns.
We perform shape optimization experiments on 2D airfoils and discuss the applicable scenarios for the two models.
arXiv Detail & Related papers (2023-05-03T13:45:40Z) - Learning Low Dimensional State Spaces with Overparameterized Recurrent
Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory.
Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z) - Designing Universal Causal Deep Learning Models: The Case of
Infinite-Dimensional Dynamical Systems from Stochastic Analysis [3.5450828190071655]
Causal operators (COs) play a central role in contemporary analysis.
There is still no canonical framework for designing Deep Learning (DL) models capable of approximating COs.
This paper proposes a "geometry-aware" solution to this open problem by introducing a DL model-design framework.
arXiv Detail & Related papers (2022-10-24T14:43:03Z) - On the Influence of Enforcing Model Identifiability on Learning dynamics
of Gaussian Mixture Models [14.759688428864159]
We propose a technique for extracting submodels from singular models.
Our method enforces model identifiability during training.
We show how the method can be applied to more complex models like deep neural networks.
arXiv Detail & Related papers (2022-06-17T07:50:22Z) - Neural Operator with Regularity Structure for Modeling Dynamics Driven
by SPDEs [70.51212431290611]
Partial differential equations (SPDEs) are significant tools for modeling dynamics in many areas including atmospheric sciences and physics.
We propose the Neural Operator with Regularity Structure (NORS) which incorporates the feature vectors for modeling dynamics driven by SPDEs.
We conduct experiments on various of SPDEs including the dynamic Phi41 model and the 2d Navier-Stokes equation.
arXiv Detail & Related papers (2022-04-13T08:53:41Z) - Dynamically-Scaled Deep Canonical Correlation Analysis [77.34726150561087]
Canonical Correlation Analysis (CCA) is a method for feature extraction of two views by finding maximally correlated linear projections of them.
We introduce a novel dynamic scaling method for training an input-dependent canonical correlation model.
arXiv Detail & Related papers (2022-03-23T12:52:49Z) - Post-mortem on a deep learning contest: a Simpson's paradox and the
complementary roles of scale metrics versus shape metrics [61.49826776409194]
We analyze a corpus of models made publicly-available for a contest to predict the generalization accuracy of neural network (NN) models.
We identify what amounts to a Simpson's paradox: where "scale" metrics perform well overall but perform poorly on sub partitions of the data.
We present two novel shape metrics, one data-independent, and the other data-dependent, which can predict trends in the test accuracy of a series of NNs.
arXiv Detail & Related papers (2021-06-01T19:19:49Z) - Intrinsic Dimensionality Explains the Effectiveness of Language Model
Fine-Tuning [52.624194343095304]
We argue that analyzing fine-tuning through the lens of intrinsic dimension provides us with empirical and theoretical intuitions.
We empirically show that common pre-trained models have a very low intrinsic dimension.
arXiv Detail & Related papers (2020-12-22T07:42:30Z) - Rethinking Parameter Counting in Deep Models: Effective Dimensionality
Revisited [36.712632126776285]
We show that neural networks have mysterious generalization properties when using parameter counting as a proxy for complexity.
We show that many of these properties become understandable when viewed through the lens of effective dimensionality, which measures the dimensionality of the parameter space determined by the data.
arXiv Detail & Related papers (2020-03-04T15:39:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.