Bayesian hierarchical stacking
- URL: http://arxiv.org/abs/2101.08954v1
- Date: Fri, 22 Jan 2021 05:19:49 GMT
- Title: Bayesian hierarchical stacking
- Authors: Yuling Yao, Gregor Pir\v{s}, Aki Vehtari, Andrew Gelman
- Abstract summary: We show that stacking is most effective when the model predictive performance is heterogeneous in inputs.
With the input-varying yet partially-pooled model weights, hierarchical stacking improves average and conditional predictions.
- Score: 10.371079239965836
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stacking is a widely used model averaging technique that yields
asymptotically optimal prediction among all linear averages. We show that
stacking is most effective when the model predictive performance is
heterogeneous in inputs, so that we can further improve the stacked mixture
with a hierarchical model. With the input-varying yet partially-pooled model
weights, hierarchical stacking improves average and conditional predictions.
Our Bayesian formulation includes constant-weight (complete-pooling) stacking
as a special case. We generalize to incorporate discrete and continuous inputs,
other structured priors, and time-series and longitudinal data. We demonstrate
on several applied problems.
Related papers
- Amortizing intractable inference in diffusion models for vision, language, and control [89.65631572949702]
This paper studies amortized sampling of the posterior over data, $mathbfxsim prm post(mathbfx)propto p(mathbfx)r(mathbfx)$, in a model that consists of a diffusion generative model prior $p(mathbfx)$ and a black-box constraint or function $r(mathbfx)$.
We prove the correctness of a data-free learning objective, relative trajectory balance, for training a diffusion model that samples from
arXiv Detail & Related papers (2024-05-31T16:18:46Z) - Theoretical Guarantees of Data Augmented Last Layer Retraining Methods [5.352699766206809]
Linear last layer retraining strategies have been shown to achieve state-of-the-art performance for worst-group accuracy.
We present the optimal worst-group accuracy when modeling the distribution of the latent representations.
We evaluate and verify our results for both synthetic and large publicly available datasets.
arXiv Detail & Related papers (2024-05-09T17:16:54Z) - BayesBlend: Easy Model Blending using Pseudo-Bayesian Model Averaging, Stacking and Hierarchical Stacking in Python [0.0]
We introduce the BayesBlend Python package to estimate weights and blend multiple (Bayesian) models' predictive distributions.
BayesBlend implements pseudo-Bayesian model averaging, stacking and, uniquely, hierarchical Bayesian stacking to estimate model weights.
We demonstrate the usage of BayesBlend with examples of insurance loss modeling.
arXiv Detail & Related papers (2024-04-30T19:15:33Z) - Fusion of Gaussian Processes Predictions with Monte Carlo Sampling [61.31380086717422]
In science and engineering, we often work with models designed for accurate prediction of variables of interest.
Recognizing that these models are approximations of reality, it becomes desirable to apply multiple models to the same data and integrate their outcomes.
arXiv Detail & Related papers (2024-03-03T04:21:21Z) - Approximating a RUM from Distributions on k-Slates [88.32814292632675]
We find a generalization-time algorithm that finds the RUM that best approximates the given distribution on average.
Our theoretical result can also be made practical: we obtain a that is effective and scales to real-world datasets.
arXiv Detail & Related papers (2023-05-22T17:43:34Z) - Locking and Quacking: Stacking Bayesian model predictions by log-pooling
and superposition [0.5735035463793007]
We present two novel tools for combining predictions from different models.
These are generalisations of model stacking, but combine posterior densities by log-linear pooling and quantum superposition.
To optimise model weights while avoiding the burden of normalising constants, we investigate the Hyvarinen score of the combined posterior predictions.
arXiv Detail & Related papers (2023-05-12T09:26:26Z) - Compound Batch Normalization for Long-tailed Image Classification [77.42829178064807]
We propose a compound batch normalization method based on a Gaussian mixture.
It can model the feature space more comprehensively and reduce the dominance of head classes.
The proposed method outperforms existing methods on long-tailed image classification.
arXiv Detail & Related papers (2022-12-02T07:31:39Z) - Continuously Generalized Ordinal Regression for Linear and Deep Models [41.03778663275373]
Ordinal regression is a classification task where classes have an order and prediction error increases the further the predicted class is from the true class.
We propose a new approach for modeling ordinal data that allows class-specific hyperplane slopes.
Our method significantly outperforms the standard ordinal logistic model over a thorough set of ordinal regression benchmark datasets.
arXiv Detail & Related papers (2022-02-14T19:49:05Z) - Optimal Ensemble Construction for Multi-Study Prediction with
Applications to COVID-19 Excess Mortality Estimation [7.02598981483736]
Multi-study ensembling uses a two-stage strategy which fits study-specific models and estimates ensemble weights separately.
This approach ignores the ensemble properties at the model-fitting stage, potentially resulting in a loss of efficiency.
We show that when little data is available for a country before the onset of the pandemic, leveraging data from other countries can substantially improve prediction accuracy.
arXiv Detail & Related papers (2021-09-19T16:52:41Z) - Evaluating State-of-the-Art Classification Models Against Bayes
Optimality [106.50867011164584]
We show that we can compute the exact Bayes error of generative models learned using normalizing flows.
We use our approach to conduct a thorough investigation of state-of-the-art classification models.
arXiv Detail & Related papers (2021-06-07T06:21:20Z) - Slice Sampling for General Completely Random Measures [74.24975039689893]
We present a novel Markov chain Monte Carlo algorithm for posterior inference that adaptively sets the truncation level using auxiliary slice variables.
The efficacy of the proposed algorithm is evaluated on several popular nonparametric models.
arXiv Detail & Related papers (2020-06-24T17:53:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.