Post-mortem on a deep learning contest: a Simpson's paradox and the
complementary roles of scale metrics versus shape metrics
- URL: http://arxiv.org/abs/2106.00734v1
- Date: Tue, 1 Jun 2021 19:19:49 GMT
- Title: Post-mortem on a deep learning contest: a Simpson's paradox and the
complementary roles of scale metrics versus shape metrics
- Authors: Charles H. Martin and Michael W. Mahoney
- Abstract summary: We analyze a corpus of models made publicly-available for a contest to predict the generalization accuracy of neural network (NN) models.
We identify what amounts to a Simpson's paradox: where "scale" metrics perform well overall but perform poorly on sub partitions of the data.
We present two novel shape metrics, one data-independent, and the other data-dependent, which can predict trends in the test accuracy of a series of NNs.
- Score: 61.49826776409194
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To understand better the causes of good generalization performance in
state-of-the-art neural network (NN) models, we analyze of a corpus of models
that was made publicly-available for a contest to predict the generalization
accuracy of NNs. These models include a wide range of qualities and were
trained with a range of architectures and regularization hyperparameters. We
identify what amounts to a Simpson's paradox: where "scale" metrics (from
traditional statistical learning theory) perform well overall but perform
poorly on subpartitions of the data of a given depth, when regularization
hyperparameters are varied; and where "shape" metrics (from Heavy-Tailed Self
Regularization theory) perform well on subpartitions of the data, when
hyperparameters are varied for models of a given depth, but perform poorly
overall when models with varying depths are aggregated. Our results highlight
the subtly of comparing models when both architectures and hyperparameters are
varied, as well as the complementary role of implicit scale versus implicit
shape parameters in understanding NN model quality. Our results also suggest
caution when one tries to extract causal insight with a single metric applied
to aggregate data, and they highlight the need to go beyond one-size-fits-all
metrics based on upper bounds from generalization theory to describe the
performance of state-of-the-art NN models. Based on these findings, we present
two novel shape metrics, one data-independent, and the other data-dependent,
which can predict trends in the test accuracy of a series of NNs, of a fixed
architecture/depth, when varying solver hyperparameters.
Related papers
- Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models.
We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - A PAC-Bayesian Perspective on the Interpolating Information Criterion [54.548058449535155]
We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime.
We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
arXiv Detail & Related papers (2023-11-13T01:48:08Z) - Fairer and More Accurate Tabular Models Through NAS [14.147928131445852]
We propose using multi-objective Neural Architecture Search (NAS) and Hyperparameter Optimization (HPO) in the first application to the very challenging domain of tabular data.
We show that models optimized solely for accuracy with NAS often fail to inherently address fairness concerns.
We produce architectures that consistently dominate state-of-the-art bias mitigation methods either in fairness, accuracy or both.
arXiv Detail & Related papers (2023-10-18T17:56:24Z) - On the Influence of Enforcing Model Identifiability on Learning dynamics
of Gaussian Mixture Models [14.759688428864159]
We propose a technique for extracting submodels from singular models.
Our method enforces model identifiability during training.
We show how the method can be applied to more complex models like deep neural networks.
arXiv Detail & Related papers (2022-06-17T07:50:22Z) - Dynamically-Scaled Deep Canonical Correlation Analysis [77.34726150561087]
Canonical Correlation Analysis (CCA) is a method for feature extraction of two views by finding maximally correlated linear projections of them.
We introduce a novel dynamic scaling method for training an input-dependent canonical correlation model.
arXiv Detail & Related papers (2022-03-23T12:52:49Z) - Structure and Distribution Metric for Quantifying the Quality of
Uncertainty: Assessing Gaussian Processes, Deep Neural Nets, and Deep Neural
Operators for Regression [0.0]
We propose two comparison metrics that may be implemented to arbitrary dimensions in regression tasks.
The structure metric assesses the similarity in shape and location of uncertainty with the true error, while the distribution metric quantifies the supported magnitudes between the two.
We apply these metrics to Gaussian Processes (GPs), Ensemble Deep Neural Nets (DNNs), and Ensemble Deep Neural Operators (DNOs) on high-dimensional and nonlinear test cases.
arXiv Detail & Related papers (2022-03-09T04:16:31Z) - Nonparametric Functional Analysis of Generalized Linear Models Under
Nonlinear Constraints [0.0]
This article introduces a novel nonparametric methodology for Generalized Linear Models.
It combines the strengths of the binary regression and latent variable formulations for categorical data.
It extends recently published parametric versions of the methodology and generalizes it.
arXiv Detail & Related papers (2021-10-11T04:49:59Z) - Understanding Overparameterization in Generative Adversarial Networks [56.57403335510056]
Generative Adversarial Networks (GANs) are used to train non- concave mini-max optimization problems.
A theory has shown the importance of the gradient descent (GD) to globally optimal solutions.
We show that in an overized GAN with a $1$-layer neural network generator and a linear discriminator, the GDA converges to a global saddle point of the underlying non- concave min-max problem.
arXiv Detail & Related papers (2021-04-12T16:23:37Z) - Intrinsic Dimensionality Explains the Effectiveness of Language Model
Fine-Tuning [52.624194343095304]
We argue that analyzing fine-tuning through the lens of intrinsic dimension provides us with empirical and theoretical intuitions.
We empirically show that common pre-trained models have a very low intrinsic dimension.
arXiv Detail & Related papers (2020-12-22T07:42:30Z) - Memorizing without overfitting: Bias, variance, and interpolation in
over-parameterized models [0.0]
The bias-variance trade-off is a central concept in supervised learning.
Modern Deep Learning methods flout this dogma, achieving state-of-the-art performance.
arXiv Detail & Related papers (2020-10-26T22:31:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.