A Bayesian Perspective on Training Speed and Model Selection
- URL: http://arxiv.org/abs/2010.14499v1
- Date: Tue, 27 Oct 2020 17:56:14 GMT
- Title: A Bayesian Perspective on Training Speed and Model Selection
- Authors: Clare Lyle, Lisa Schut, Binxin Ru, Yarin Gal, Mark van der Wilk
- Abstract summary: We show that a measure of a model's training speed can be used to estimate its marginal likelihood.
We verify our results in model selection tasks for linear models and for the infinite-width limit of deep neural networks.
Our results suggest a promising new direction towards explaining why neural networks trained with gradient descent are biased towards functions that generalize well.
- Score: 51.15664724311443
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We take a Bayesian perspective to illustrate a connection between training
speed and the marginal likelihood in linear models. This provides two major
insights: first, that a measure of a model's training speed can be used to
estimate its marginal likelihood. Second, that this measure, under certain
conditions, predicts the relative weighting of models in linear model
combinations trained to minimize a regression loss. We verify our results in
model selection tasks for linear models and for the infinite-width limit of
deep neural networks. We further provide encouraging empirical evidence that
the intuition developed in these settings also holds for deep neural networks
trained with stochastic gradient descent. Our results suggest a promising new
direction towards explaining why neural networks trained with stochastic
gradient descent are biased towards functions that generalize well.
Related papers
- Scalable Bayesian Inference in the Era of Deep Learning: From Gaussian Processes to Deep Neural Networks [0.5827521884806072]
Large neural networks trained on large datasets have become the dominant paradigm in machine learning.
This thesis develops scalable methods to equip neural networks with model uncertainty.
arXiv Detail & Related papers (2024-04-29T23:38:58Z) - A Dynamical Model of Neural Scaling Laws [79.59705237659547]
We analyze a random feature model trained with gradient descent as a solvable model of network training and generalization.
Our theory shows how the gap between training and test loss can gradually build up over time due to repeated reuse of data.
arXiv Detail & Related papers (2024-02-02T01:41:38Z) - Quadratic models for understanding catapult dynamics of neural networks [15.381097076708535]
We show that recently proposed Neural Quadratic Models can exhibit the "catapult phase" that arises when training such models with large learning rates.
Our analysis further demonstrates that quadratic models can be an effective tool for analysis of neural networks.
arXiv Detail & Related papers (2022-05-24T05:03:06Z) - Benign Overfitting without Linearity: Neural Network Classifiers Trained
by Gradient Descent for Noisy Linear Data [44.431266188350655]
We consider the generalization error of two-layer neural networks trained to generalize by gradient descent.
We show that neural networks exhibit benign overfitting: they can be driven to zero training error, perfectly fitting any noisy training labels, and simultaneously achieve minimax optimal test error.
In contrast to previous work on benign overfitting that require linear or kernel-based predictors, our analysis holds in a setting where both the model and learning dynamics are fundamentally nonlinear.
arXiv Detail & Related papers (2022-02-11T23:04:00Z) - Neural Capacitance: A New Perspective of Neural Network Selection via
Edge Dynamics [85.31710759801705]
Current practice requires expensive computational costs in model training for performance prediction.
We propose a novel framework for neural network selection by analyzing the governing dynamics over synaptic connections (edges) during training.
Our framework is built on the fact that back-propagation during neural network training is equivalent to the dynamical evolution of synaptic connections.
arXiv Detail & Related papers (2022-01-11T20:53:15Z) - Sparse Flows: Pruning Continuous-depth Models [107.98191032466544]
We show that pruning improves generalization for neural ODEs in generative modeling.
We also show that pruning finds minimal and efficient neural ODE representations with up to 98% less parameters compared to the original network, without loss of accuracy.
arXiv Detail & Related papers (2021-06-24T01:40:17Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z) - Embedded training of neural-network sub-grid-scale turbulence models [0.0]
The weights of a deep neural network model are optimized in conjunction with the governing flow equations to provide a model for sub-grid-scale stresses.
The training is by a gradient descent method, which uses the adjoint Navier-Stokes equations to provide the end-to-end sensitivities of the model weights to the velocity fields.
arXiv Detail & Related papers (2021-05-03T17:28:39Z) - The Gaussian equivalence of generative models for learning with shallow
neural networks [30.47878306277163]
We study the performance of neural networks trained on data drawn from pre-trained generative models.
We provide three strands of rigorous, analytical and numerical evidence corroborating this equivalence.
These results open a viable path to the theoretical study of machine learning models with realistic data.
arXiv Detail & Related papers (2020-06-25T21:20:09Z) - Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks [78.76880041670904]
In neural networks with binary activations and or binary weights the training by gradient descent is complicated.
We propose a new method for this estimation problem combining sampling and analytic approximation steps.
We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models.
arXiv Detail & Related papers (2020-06-04T21:51:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.