Related papers: Optimal Training of Mean Variance Estimation Neural Networks

Optimal Training of Mean Variance Estimation Neural Networks

URL: http://arxiv.org/abs/2302.08875v2
Date: Thu, 3 Aug 2023 12:59:57 GMT
Title: Optimal Training of Mean Variance Estimation Neural Networks
Authors: Laurens Sluijterman, Eric Cator, Tom Heskes
Abstract summary: This paper focusses on the optimal implementation of a Mean Variance Estimation network (MVE network) (Nix and Weigend, 1994) An MVE network assumes that the data is produced from a normal distribution with a mean function and variance function. We introduce a novel improvement of the MVE network: separate regularization of the mean and the variance estimate.
Score: 1.4610038284393165
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper focusses on the optimal implementation of a Mean Variance Estimation network (MVE network) (Nix and Weigend, 1994). This type of network is often used as a building block for uncertainty estimation methods in a regression setting, for instance Concrete dropout (Gal et al., 2017) and Deep Ensembles (Lakshminarayanan et al., 2017). Specifically, an MVE network assumes that the data is produced from a normal distribution with a mean function and variance function. The MVE network outputs a mean and variance estimate and optimizes the network parameters by minimizing the negative loglikelihood. In our paper, we present two significant insights. Firstly, the convergence difficulties reported in recent work can be relatively easily prevented by following the simple yet often overlooked recommendation from the original authors that a warm-up period should be used. During this period, only the mean is optimized with a fixed variance. We demonstrate the effectiveness of this step through experimentation, highlighting that it should be standard practice. As a sidenote, we examine whether, after the warm-up, it is beneficial to fix the mean while optimizing the variance or to optimize both simultaneously. Here, we do not observe a substantial difference. Secondly, we introduce a novel improvement of the MVE network: separate regularization of the mean and the variance estimate. We demonstrate, both on toy examples and on a number of benchmark UCI regression data sets, that following the original recommendations and the novel separate regularization can lead to significant improvements.

Related papers

Favour: FAst Variance Operator for Uncertainty Rating [0.034530027457862]
Bayesian Neural Networks (BNN) have emerged as a crucial approach for interpreting ML predictions. By sampling from the posterior distribution, data scientists may estimate the uncertainty of an inference. Previous work proposed propagating the first and second moments of the posterior directly through the network. This method is even slower than sampling, so the propagated variance needs to be approximated. Our contribution is a more principled variance propagation framework.
arXiv Detail & Related papers (2023-11-21T22:53:20Z)
Domain-Adjusted Regression or: ERM May Already Learn Features Sufficient for Out-of-Distribution Generalization [52.7137956951533]
We argue that devising simpler methods for learning predictors on existing features is a promising direction for future research. We introduce Domain-Adjusted Regression (DARE), a convex objective for learning a linear predictor that is provably robust under a new model of distribution shift. Under a natural model, we prove that the DARE solution is the minimax-optimal predictor for a constrained set of test distributions.
arXiv Detail & Related papers (2022-02-14T16:42:16Z)
Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models. In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints. A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z)
Variational Refinement for Importance Sampling Using the Forward Kullback-Leibler Divergence [77.06203118175335]
Variational Inference (VI) is a popular alternative to exact sampling in Bayesian inference. Importance sampling (IS) is often used to fine-tune and de-bias the estimates of approximate Bayesian inference procedures. We propose a novel combination of optimization and sampling techniques for approximate Bayesian inference.
arXiv Detail & Related papers (2021-06-30T11:00:24Z)
KL Guided Domain Adaptation [88.19298405363452]
Domain adaptation is an important problem and often needed for real-world applications. A common approach in the domain adaptation literature is to learn a representation of the input that has the same distributions over the source and the target domain. We show that with a probabilistic representation network, the KL term can be estimated efficiently via minibatch samples.
arXiv Detail & Related papers (2021-06-14T22:24:23Z)
A Novel Regression Loss for Non-Parametric Uncertainty Optimization [7.766663822644739]
Quantification of uncertainty is one of the most promising approaches to establish safe machine learning. One of the most commonly used approaches so far is Monte Carlo dropout, which is computationally cheap and easy to apply in practice. We propose a new objective, referred to as second-moment loss ( UCI), to address this issue.
arXiv Detail & Related papers (2021-01-07T19:12:06Z)
Second-Moment Loss: A Novel Regression Objective for Improved Uncertainties [7.766663822644739]
Quantification of uncertainty is one of the most promising approaches to establish safe machine learning. One of the most commonly used approaches so far is Monte Carlo dropout, which is computationally cheap and easy to apply in practice. We propose a new objective, referred to as second-moment loss ( UCI), to address this issue.
arXiv Detail & Related papers (2020-12-23T14:17:33Z)
Recursive Inference for Variational Autoencoders [34.552283758419506]
Inference networks of traditional Variational Autoencoders (VAEs) are typically amortized. Recent semi-amortized approaches were proposed to address this drawback. We introduce an accurate amortized inference algorithm.
arXiv Detail & Related papers (2020-11-17T10:22:12Z)
Temporal Action Localization with Variance-Aware Networks [12.364819165688628]
This work addresses the problem of temporal action localization with Variance-Aware Networks (VAN) VANp is a network that propagates the mean and the variance throughout the network to deliver outputs with second order statistics. Results show that VANp surpasses the accuracy of virtually all other two-stage networks without involving any additional parameters.
arXiv Detail & Related papers (2020-08-25T20:12:59Z)
Evaluating Prediction-Time Batch Normalization for Robustness under Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift. We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness. The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z)
Bandit Samplers for Training Graph Neural Networks [63.17765191700203]
Several sampling algorithms with variance reduction have been proposed for accelerating the training of Graph Convolution Networks (GCNs) These sampling algorithms are not applicable to more general graph neural networks (GNNs) where the message aggregator contains learned weights rather than fixed weights, such as Graph Attention Networks (GAT)
arXiv Detail & Related papers (2020-06-10T12:48:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.