Gradient Boosted Mixed Models: Flexible Joint Estimation of Mean and Variance Components for Clustered Data
- URL: http://arxiv.org/abs/2511.00217v1
- Date: Fri, 31 Oct 2025 19:28:26 GMT
- Title: Gradient Boosted Mixed Models: Flexible Joint Estimation of Mean and Variance Components for Clustered Data
- Authors: Mitchell L. Prevett, Francis K. C. Hui, Zhi Yang Tho, A. H. Welsh, Anton H. Westveld,
- Abstract summary: Gradient Boosted Mixed Models (GBMixed) is a framework and algorithm that extends boosting to jointly estimate mean and variance components.<n> Simulations and real-world applications demonstrate accurate recovery of variance components, calibrated prediction intervals, and improved predictive accuracy.
- Score: 0.6361348748202732
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Linear mixed models are widely used for clustered data, but their reliance on parametric forms limits flexibility in complex and high-dimensional settings. In contrast, gradient boosting methods achieve high predictive accuracy through nonparametric estimation, but do not accommodate clustered data structures or provide uncertainty quantification. We introduce Gradient Boosted Mixed Models (GBMixed), a framework and algorithm that extends boosting to jointly estimate mean and variance components via likelihood-based gradients. In addition to nonparametric mean estimation, the method models both random effects and residual variances as potentially covariate-dependent functions using flexible base learners such as regression trees or splines, enabling nonparametric estimation while maintaining interpretability. Simulations and real-world applications demonstrate accurate recovery of variance components, calibrated prediction intervals, and improved predictive accuracy relative to standard linear mixed models and nonparametric methods. GBMixed provides heteroscedastic uncertainty quantification and introduces boosting for heterogeneous random effects. This enables covariate-dependent shrinkage for cluster-specific predictions to adapt between population and cluster-level data. Under standard causal assumptions, the framework enables estimation of heterogeneous treatment effects with reliable uncertainty quantification.
Related papers
- Block Empirical Likelihood Inference for Longitudinal Generalized Partially Linear Single-Index Models [9.83545269451662]
Generalized partially linear single-index models (GPLSIMs) provide a flexible and interpretable semiparametric framework for longitudinal outcomes.<n>For repeated measurements, valid inference is challenging because within-subject correlation induces nuisance parameters and variance estimation can be unstable in semiparametric settings.<n>We propose a profile estimating-equation approach based on spline approximation of the unknown link function and construct a subject-level block empirical likelihood (BEL) for joint inference on the parametric coefficients and the single-index direction.
arXiv Detail & Related papers (2026-02-16T18:03:59Z) - Nonparametric Data Attribution for Diffusion Models [57.820618036556084]
Data attribution for generative models seeks to quantify the influence of individual training examples on model outputs.<n>We propose a nonparametric attribution method that operates entirely on data, measuring influence via patch-level similarity between generated and training images.
arXiv Detail & Related papers (2025-10-16T03:37:16Z) - Debiased Nonparametric Regression for Statistical Inference and Distributionally Robustness [10.470114319701576]
We introduce a model-free debiasing method for smooth nonparametric regression estimators.<n>We obtain a debiased estimator that satisfies pointwise and uniform risk convergence, along with smoothness, under mild conditions.
arXiv Detail & Related papers (2024-12-28T15:01:19Z) - Semiparametric conformal prediction [79.6147286161434]
We construct a conformal prediction set accounting for the joint correlation structure of the vector-valued non-conformity scores.<n>We flexibly estimate the joint cumulative distribution function (CDF) of the scores.<n>Our method yields desired coverage and competitive efficiency on a range of real-world regression problems.
arXiv Detail & Related papers (2024-11-04T14:29:02Z) - Assumption-Lean Post-Integrated Inference with Surrogate Control Outcomes [6.448728765953916]
We introduce a robust post-integrated inference (PII) method that adjusts for latent heterogeneity using control outcomes.<n>We develop semiparametric inference on projected direct effect estimands, accounting for hidden mediators, confounders, and moderators.<n>The proposed doubly robust estimators are consistent and efficient under minimal assumptions and potential misspecification.
arXiv Detail & Related papers (2024-10-07T12:52:38Z) - Multivariate root-n-consistent smoothing parameter free matching estimators and estimators of inverse density weighted expectations [51.000851088730684]
We develop novel modifications of nearest-neighbor and matching estimators which converge at the parametric $sqrt n $-rate.<n>We stress that our estimators do not involve nonparametric function estimators and in particular do not rely on sample-size dependent parameters smoothing.
arXiv Detail & Related papers (2024-07-11T13:28:34Z) - Partially factorized variational inference for high-dimensional mixed models [0.0]
Variational inference is a popular way to perform such computations, especially in the Bayesian context.<n>We show that standard mean-field variational inference dramatically underestimates posterior uncertainty in high-dimensions.<n>We then show how appropriately relaxing the mean-field assumption leads to methods whose uncertainty quantification does not deteriorate in high-dimensions.
arXiv Detail & Related papers (2023-12-20T16:12:37Z) - Scalable Marginal Likelihood Estimation for Model Selection in Deep
Learning [78.83598532168256]
Marginal-likelihood based model-selection is rarely used in deep learning due to estimation difficulties.
Our work shows that marginal likelihoods can improve generalization and be useful when validation data is unavailable.
arXiv Detail & Related papers (2021-04-11T09:50:24Z) - A similarity-based Bayesian mixture-of-experts model [0.5156484100374058]
We present a new non-parametric mixture-of-experts model for multivariate regression problems.
Using a conditionally specified model, predictions for out-of-sample inputs are based on similarities to each observed data point.
Posterior inference is performed on the parameters of the mixture as well as the distance metric.
arXiv Detail & Related papers (2020-12-03T18:08:30Z) - Efficient Ensemble Model Generation for Uncertainty Estimation with
Bayesian Approximation in Segmentation [74.06904875527556]
We propose a generic and efficient segmentation framework to construct ensemble segmentation models.
In the proposed method, ensemble models can be efficiently generated by using the layer selection method.
We also devise a new pixel-wise uncertainty loss, which improves the predictive performance.
arXiv Detail & Related papers (2020-05-21T16:08:38Z) - Nonparametric Score Estimators [49.42469547970041]
Estimating the score from a set of samples generated by an unknown distribution is a fundamental task in inference and learning of probabilistic models.
We provide a unifying view of these estimators under the framework of regularized nonparametric regression.
We propose score estimators based on iterative regularization that enjoy computational benefits from curl-free kernels and fast convergence.
arXiv Detail & Related papers (2020-05-20T15:01:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.