Related papers: Scalable Marginal Likelihood Estimation for Model Selection in Deep Learning

Scalable Marginal Likelihood Estimation for Model Selection in Deep Learning

URL: http://arxiv.org/abs/2104.04975v1
Date: Sun, 11 Apr 2021 09:50:24 GMT
Title: Scalable Marginal Likelihood Estimation for Model Selection in Deep Learning
Authors: Alexander Immer, Matthias Bauer, Vincent Fortuin, Gunnar R\"atsch, Mohammad Emtiyaz Khan
Abstract summary: Marginal-likelihood based model-selection is rarely used in deep learning due to estimation difficulties. Our work shows that marginal likelihoods can improve generalization and be useful when validation data is unavailable.
Score: 78.83598532168256
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Marginal-likelihood based model-selection, even though promising, is rarely used in deep learning due to estimation difficulties. Instead, most approaches rely on validation data, which may not be readily available. In this work, we present a scalable marginal-likelihood estimation method to select both the hyperparameters and network architecture based on the training data alone. Some hyperparameters can be estimated online during training, simplifying the procedure. Our marginal-likelihood estimate is based on Laplace's method and Gauss-Newton approximations to the Hessian, and it outperforms cross-validation and manual-tuning on standard regression and image classification datasets, especially in terms of calibration and out-of-distribution detection. Our work shows that marginal likelihoods can improve generalization and be useful when validation data is unavailable (e.g., in nonstationary settings).

Related papers

Calibrated Selective Classification [34.08454890436067]
We develop a new approach to selective classification in which we propose a method for rejecting examples with "uncertain" uncertainties. We present a framework for learning selectively calibrated models, where a separate selector network is trained to improve the selective calibration error of a given base model. We demonstrate the empirical effectiveness of our approach on multiple image classification and lung cancer risk assessment tasks.
arXiv Detail & Related papers (2022-08-25T13:31:09Z)
Invariance Learning in Deep Neural Networks with Differentiable Laplace Approximations [76.82124752950148]
We develop a convenient gradient-based method for selecting the data augmentation. We use a differentiable Kronecker-factored Laplace approximation to the marginal likelihood as our objective.
arXiv Detail & Related papers (2022-02-22T02:51:11Z)
Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions. In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data. We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z)
Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models. In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints. A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z)
Variational Inference with NoFAS: Normalizing Flow with Adaptive Surrogate for Computationally Expensive Models [7.217783736464403]
Use of sampling-based approaches such as Markov chain Monte Carlo may become intractable when each likelihood evaluation is computationally expensive. New approaches combining variational inference with normalizing flow are characterized by a computational cost that grows only linearly with the dimensionality of the latent variable space. We propose Normalizing Flow with Adaptive Surrogate (NoFAS), an optimization strategy that alternatively updates the normalizing flow parameters and the weights of a neural network surrogate model.
arXiv Detail & Related papers (2021-08-28T14:31:45Z)
Imputation-Free Learning from Incomplete Observations [73.15386629370111]
We introduce the importance of guided gradient descent (IGSGD) method to train inference from inputs containing missing values without imputation. We employ reinforcement learning (RL) to adjust the gradients used to train the models via back-propagation. Our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
arXiv Detail & Related papers (2021-07-05T12:44:39Z)
Last Layer Marginal Likelihood for Invariance Learning [12.00078928875924]
We introduce a new lower bound to the marginal likelihood, which allows us to perform inference for a larger class of likelihood functions. We work towards bringing this approach to neural networks by using an architecture with a Gaussian process in the last layer.
arXiv Detail & Related papers (2021-06-14T15:40:51Z)
Evaluating State-of-the-Art Classification Models Against Bayes Optimality [106.50867011164584]
We show that we can compute the exact Bayes error of generative models learned using normalizing flows. We use our approach to conduct a thorough investigation of state-of-the-art classification models.
arXiv Detail & Related papers (2021-06-07T06:21:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.