Related papers: PEP: Parameter Ensembling by Perturbation

PEP: Parameter Ensembling by Perturbation

URL: http://arxiv.org/abs/2010.12721v1
Date: Sat, 24 Oct 2020 00:16:03 GMT
Title: PEP: Parameter Ensembling by Perturbation
Authors: Alireza Mehrtash, Purang Abolmaesumi, Polina Golland, Tina Kapur, Demian Wassermann, William M. Wells III
Abstract summary: Ensembling by Perturbation (PEP) constructs an ensemble of parameter values as random perturbations of the optimal parameter set from training. PEP provides a small improvement in performance, and, in some cases, a substantial improvement in empirical calibration. PEP can be used to probe the level of overfitting that occurred during training.
Score: 13.221295194854642
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Ensembling is now recognized as an effective approach for increasing the predictive performance and calibration of deep networks. We introduce a new approach, Parameter Ensembling by Perturbation (PEP), that constructs an ensemble of parameter values as random perturbations of the optimal parameter set from training by a Gaussian with a single variance parameter. The variance is chosen to maximize the log-likelihood of the ensemble average ($\mathbb{L}$) on the validation data set. Empirically, and perhaps surprisingly, $\mathbb{L}$ has a well-defined maximum as the variance grows from zero (which corresponds to the baseline model). Conveniently, calibration level of predictions also tends to grow favorably until the peak of $\mathbb{L}$ is reached. In most experiments, PEP provides a small improvement in performance, and, in some cases, a substantial improvement in empirical calibration. We show that this "PEP effect" (the gain in log-likelihood) is related to the mean curvature of the likelihood function and the empirical Fisher information. Experiments on ImageNet pre-trained networks including ResNet, DenseNet, and Inception showed improved calibration and likelihood. We further observed a mild improvement in classification accuracy on these networks. Experiments on classification benchmarks such as MNIST and CIFAR-10 showed improved calibration and likelihood, as well as the relationship between the PEP effect and overfitting; this demonstrates that PEP can be used to probe the level of overfitting that occurred during training. In general, no special training procedure or network architecture is needed, and in the case of pre-trained networks, no additional training is needed.

Related papers

Evaluating Uncertainty in Deep Gaussian Processes [0.0]
Deep Gaussian Processes (DGPs) and Deep Sigma Point Processes (DSPPs) extend GPs hierarchically, offering promising methods for uncertainty grounded in Bayesian principles. This work evaluates these models on regression (CASP dataset) and classification (ESR dataset) tasks, assessing predictive performance (MAE, Accu- racy), calibration using Negative Log-Likelihood (NLL) and Error (ECE) Compared to Deep Ensembles, which demonstrated superior robustness in both per-formance and calibration under the tested shifts, the GP-based methods showed vulnerabilities, exhibiting particular sensitivity in the observed metrics.
arXiv Detail & Related papers (2025-04-24T16:31:55Z)
Fixed-Mean Gaussian Processes for Post-hoc Bayesian Deep Learning [11.22428369342346]
We introduce a novel family of sparse variational Gaussian processes (GPs), where the posterior mean is fixed to any continuous function when using a universal kernel. Specifically, we fix the mean of this GP to the output of the pre-trained DNN, allowing our approach to effectively fit the GP's predictive variances to estimate the prediction uncertainty. Experimental results demonstrate that FMGP improves both uncertainty estimation and computational efficiency when compared to state-of-the-art methods.
arXiv Detail & Related papers (2024-12-05T14:17:16Z)
Just How Flexible are Neural Networks in Practice? [89.80474583606242]
It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters. In practice, however, we only find solutions via our training procedure, including the gradient and regularizers, limiting flexibility.
arXiv Detail & Related papers (2024-06-17T12:24:45Z)
Domain-adaptive and Subgroup-specific Cascaded Temperature Regression for Out-of-distribution Calibration [16.930766717110053]
We propose a novel meta-set-based cascaded temperature regression method for post-hoc calibration. We partition each meta-set into subgroups based on predicted category and confidence level, capturing diverse uncertainties. A regression network is then trained to derive category-specific and confidence-level-specific scaling, achieving calibration across meta-sets.
arXiv Detail & Related papers (2024-02-14T14:35:57Z)
Sparse is Enough in Fine-tuning Pre-trained Large Language Models [98.46493578509039]
We propose a gradient-based sparse fine-tuning algorithm, named Sparse Increment Fine-Tuning (SIFT) We validate its effectiveness on a range of tasks including the GLUE Benchmark and Instruction-tuning.
arXiv Detail & Related papers (2023-12-19T06:06:30Z)
Variational Linearized Laplace Approximation for Bayesian Deep Learning [11.22428369342346]
We propose a new method for approximating Linearized Laplace Approximation (LLA) using a variational sparse Gaussian Process (GP) Our method is based on the dual RKHS formulation of GPs and retains, as the predictive mean, the output of the original DNN. It allows for efficient optimization, which results in sub-linear training time in the size of the training dataset.
arXiv Detail & Related papers (2023-02-24T10:32:30Z)
Reliable Prediction Intervals with Directly Optimized Inductive Conformal Regression for Deep Learning [3.42658286826597]
Predictions intervals (PIs) are used to quantify the uncertainty of each prediction in deep learning regression. Many approaches to improve the quality of PIs can effectively reduce the width of PIs, but they do not ensure that enough real labels are captured. In this study, we use Directly Optimized Inductive Conformal Regression (DOICR) that takes only the average width of PIs as the loss function. Benchmark experiments show that DOICR outperforms current state-of-the-art algorithms for regression problems.
arXiv Detail & Related papers (2023-02-02T04:46:14Z)
Predicting Deep Neural Network Generalization with Perturbation Response Curves [58.8755389068888]
We propose a new framework for evaluating the generalization capabilities of trained networks. Specifically, we introduce two new measures for accurately predicting generalization gaps. We attain better predictive scores than the current state-of-the-art measures on a majority of tasks in the Predicting Generalization in Deep Learning (PGDL) NeurIPS 2020 competition.
arXiv Detail & Related papers (2021-06-09T01:37:36Z)
Efficient training of physics-informed neural networks via importance sampling [2.9005223064604078]
Physics-In Neural Networks (PINNs) are a class of deep neural networks that are trained to compute systems governed by partial differential equations (PDEs) We show that an importance sampling approach will improve the convergence behavior of PINNs training.
arXiv Detail & Related papers (2021-04-26T02:45:10Z)
Exploring the Uncertainty Properties of Neural Networks' Implicit Priors in the Infinite-Width Limit [47.324627920761685]
We use recent theoretical advances that characterize the function-space prior to an ensemble of infinitely-wide NNs as a Gaussian process. This gives us a better understanding of the implicit prior NNs place on function space. We also examine the calibration of previous approaches to classification with the NNGP.
arXiv Detail & Related papers (2020-10-14T18:41:54Z)
Evaluating Prediction-Time Batch Normalization for Robustness under Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift. We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness. The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z)
Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose. We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
Scaling Equilibrium Propagation to Deep ConvNets by Drastically Reducing its Gradient Estimator Bias [65.13042449121411]
In practice, training a network with the gradient estimates provided by EP does not scale to visual tasks harder than MNIST. We show that a bias in the gradient estimate of EP, inherent in the use of finite nudging, is responsible for this phenomenon. We apply these techniques to train an architecture with asymmetric forward and backward connections, yielding a 13.2% test error.
arXiv Detail & Related papers (2020-06-06T09:36:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.