Functional Ensemble Distillation
- URL: http://arxiv.org/abs/2206.02183v1
- Date: Sun, 5 Jun 2022 14:07:17 GMT
- Title: Functional Ensemble Distillation
- Authors: Coby Penso, Idan Achituve, Ethan Fetaya
- Abstract summary: We investigate how to best distill an ensemble's predictions using an efficient model.
We find that learning the distilled model via a simple augmentation scheme in the form of mixup augmentation significantly boosts the performance.
- Score: 18.34081591772928
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Bayesian models have many desirable properties, most notable is their ability
to generalize from limited data and to properly estimate the uncertainty in
their predictions. However, these benefits come at a steep computational cost
as Bayesian inference, in most cases, is computationally intractable. One
popular approach to alleviate this problem is using a Monte-Carlo estimation
with an ensemble of models sampled from the posterior. However, this approach
still comes at a significant computational cost, as one needs to store and run
multiple models at test time. In this work, we investigate how to best distill
an ensemble's predictions using an efficient model. First, we argue that
current approaches that simply return distribution over predictions cannot
compute important properties, such as the covariance between predictions, which
can be valuable for further processing. Second, in many limited data settings,
all ensemble members achieve nearly zero training loss, namely, they produce
near-identical predictions on the training set which results in sub-optimal
distilled models. To address both problems, we propose a novel and general
distillation approach, named Functional Ensemble Distillation (FED), and we
investigate how to best distill an ensemble in this setting. We find that
learning the distilled model via a simple augmentation scheme in the form of
mixup augmentation significantly boosts the performance. We evaluated our
method on several tasks and showed that it achieves superior results in both
accuracy and uncertainty estimation compared to current approaches.
Related papers
- From Prediction to Action: Critical Role of Performance Estimation for
Machine-Learning-Driven Materials Discovery [2.3243389656894595]
We argue that the lack of proper performance estimation methods from pre-computed data collections is a fundamental problem for improving data-driven materials discovery.
We propose a novel such estimator that, in contrast to na"ive reward estimation, successfully predicts Gaussian processes with the "expected improvement" acquisition function.
arXiv Detail & Related papers (2023-11-27T05:29:43Z) - Structured Radial Basis Function Network: Modelling Diversity for
Multiple Hypotheses Prediction [51.82628081279621]
Multi-modal regression is important in forecasting nonstationary processes or with a complex mixture of distributions.
A Structured Radial Basis Function Network is presented as an ensemble of multiple hypotheses predictors for regression problems.
It is proved that this structured model can efficiently interpolate this tessellation and approximate the multiple hypotheses target distribution.
arXiv Detail & Related papers (2023-09-02T01:27:53Z) - Consensus-Adaptive RANSAC [104.87576373187426]
We propose a new RANSAC framework that learns to explore the parameter space by considering the residuals seen so far via a novel attention layer.
The attention mechanism operates on a batch of point-to-model residuals, and updates a per-point estimation state to take into account the consensus found through a lightweight one-step transformer.
arXiv Detail & Related papers (2023-07-26T08:25:46Z) - Better Batch for Deep Probabilistic Time Series Forecasting [15.31488551912888]
We propose an innovative training method that incorporates error autocorrelation to enhance probabilistic forecasting accuracy.
Our method constructs a mini-batch as a collection of $D$ consecutive time series segments for model training.
It explicitly learns a time-varying covariance matrix over each mini-batch, encoding error correlation among adjacent time steps.
arXiv Detail & Related papers (2023-05-26T15:36:59Z) - Training Discrete Deep Generative Models via Gapped Straight-Through
Estimator [72.71398034617607]
We propose a Gapped Straight-Through ( GST) estimator to reduce the variance without incurring resampling overhead.
This estimator is inspired by the essential properties of Straight-Through Gumbel-Softmax.
Experiments demonstrate that the proposed GST estimator enjoys better performance compared to strong baselines on two discrete deep generative modeling tasks.
arXiv Detail & Related papers (2022-06-15T01:46:05Z) - Diversity Matters When Learning From Ensembles [20.05842308307947]
Deep ensembles excel in large-scale image classification tasks both in terms of prediction accuracy and calibration.
Despite being simple to train, the computation and memory cost of deep ensembles limits their practicability.
We propose a simple approach for reducing this gap, i.e., making the distilled performance close to the full ensemble.
arXiv Detail & Related papers (2021-10-27T03:44:34Z) - Residual Overfit Method of Exploration [78.07532520582313]
We propose an approximate exploration methodology based on fitting only two point estimates, one tuned and one overfit.
The approach drives exploration towards actions where the overfit model exhibits the most overfitting compared to the tuned model.
We compare ROME against a set of established contextual bandit methods on three datasets and find it to be one of the best performing.
arXiv Detail & Related papers (2021-10-06T17:05:33Z) - Evaluating Prediction-Time Batch Normalization for Robustness under
Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift.
We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness.
The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z) - Efficient Ensemble Model Generation for Uncertainty Estimation with
Bayesian Approximation in Segmentation [74.06904875527556]
We propose a generic and efficient segmentation framework to construct ensemble segmentation models.
In the proposed method, ensemble models can be efficiently generated by using the layer selection method.
We also devise a new pixel-wise uncertainty loss, which improves the predictive performance.
arXiv Detail & Related papers (2020-05-21T16:08:38Z) - Gaussian Process Boosting [13.162429430481982]
We introduce a novel way to combine boosting with Gaussian process and mixed effects models.
We obtain increased prediction accuracy compared to existing approaches on simulated and real-world data sets.
arXiv Detail & Related papers (2020-04-06T13:19:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.