Improved prediction rule ensembling through model-based data generation
- URL: http://arxiv.org/abs/2109.13672v1
- Date: Tue, 28 Sep 2021 12:44:10 GMT
- Title: Improved prediction rule ensembling through model-based data generation
- Authors: Benny Markovitch, Marjolein Fokkema
- Abstract summary: Prediction rule ensembles (PRE) provide interpretable prediction models with relatively high accuracy.
PRE obtain a large set of decision rules from a (boosted) decision tree ensemble, and achieves sparsitythrough application of Lasso-penalized regression.
This article examines the use of surrogate modelsto improve performance of PRE, wherein the Lasso regression is trained with the help of a massivedataset.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Prediction rule ensembles (PRE) provide interpretable prediction models with
relatively high accuracy.PRE obtain a large set of decision rules from a
(boosted) decision tree ensemble, and achieves sparsitythrough application of
Lasso-penalized regression. This article examines the use of surrogate modelsto
improve performance of PRE, wherein the Lasso regression is trained with the
help of a massivedataset generated by the (boosted) decision tree ensemble.
This use of model-based data generationmay improve the stability and
consistency of the Lasso step, thus leading to improved overallperformance. We
propose two surrogacy approaches, and evaluate them on simulated and
existingdatasets, in terms of sparsity and predictive accuracy. The results
indicate that the use of surrogacymodels can substantially improve the sparsity
of PRE, while retaining predictive accuracy, especiallythrough the use of a
nested surrogacy approach.
Related papers
- Ranking and Combining Latent Structured Predictive Scores without Labeled Data [2.5064967708371553]
This paper introduces a novel structured unsupervised ensemble learning model (SUEL)
It exploits the dependency between a set of predictors with continuous predictive scores, rank the predictors without labeled data and combine them to an ensembled score with weights.
The efficacy of the proposed methods is rigorously assessed through both simulation studies and real-world application of risk genes discovery.
arXiv Detail & Related papers (2024-08-14T20:14:42Z) - Predictive Churn with the Set of Good Models [64.05949860750235]
We study the effect of conflicting predictions over the set of near-optimal machine learning models.
We present theoretical results on the expected churn between models within the Rashomon set.
We show how our approach can be used to better anticipate, reduce, and avoid churn in consumer-facing applications.
arXiv Detail & Related papers (2024-02-12T16:15:25Z) - Beyond mirkwood: Enhancing SED Modeling with Conformal Predictions [0.0]
We propose an advanced machine learning-based approach that enhances flexibility and uncertainty in SED fitting.
We incorporate conformalized quantile regression to convert point predictions into error bars, enhancing interpretability and reliability.
arXiv Detail & Related papers (2023-12-21T11:27:20Z) - Learning Sample Difficulty from Pre-trained Models for Reliable
Prediction [55.77136037458667]
We propose to utilize large-scale pre-trained models to guide downstream model training with sample difficulty-aware entropy regularization.
We simultaneously improve accuracy and uncertainty calibration across challenging benchmarks.
arXiv Detail & Related papers (2023-04-20T07:29:23Z) - Prediction-Oriented Bayesian Active Learning [51.426960808684655]
Expected predictive information gain (EPIG) is an acquisition function that measures information gain in the space of predictions rather than parameters.
EPIG leads to stronger predictive performance compared with BALD across a range of datasets and models.
arXiv Detail & Related papers (2023-04-17T10:59:57Z) - Improving Adaptive Conformal Prediction Using Self-Supervised Learning [72.2614468437919]
We train an auxiliary model with a self-supervised pretext task on top of an existing predictive model and use the self-supervised error as an additional feature to estimate nonconformity scores.
We empirically demonstrate the benefit of the additional information using both synthetic and real data on the efficiency (width), deficit, and excess of conformal prediction intervals.
arXiv Detail & Related papers (2023-02-23T18:57:14Z) - Robust self-healing prediction model for high dimensional data [0.685316573653194]
This work proposes a robust self healing (RSH) hybrid prediction model.
It functions by using the data in its entirety by removing errors and inconsistencies from it rather than discarding any data.
The proposed method is compared with some of the existing high performing models and the results are analyzed.
arXiv Detail & Related papers (2022-10-04T17:55:50Z) - Explainable boosted linear regression for time series forecasting [0.1876920697241348]
Time series forecasting involves collecting and analyzing past observations to develop a model to extrapolate such observations into the future.
We propose explainable boosted linear regression (EBLR) algorithm for time series forecasting.
arXiv Detail & Related papers (2020-09-18T22:31:42Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z) - Efficient Ensemble Model Generation for Uncertainty Estimation with
Bayesian Approximation in Segmentation [74.06904875527556]
We propose a generic and efficient segmentation framework to construct ensemble segmentation models.
In the proposed method, ensemble models can be efficiently generated by using the layer selection method.
We also devise a new pixel-wise uncertainty loss, which improves the predictive performance.
arXiv Detail & Related papers (2020-05-21T16:08:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.