Improved prediction rule ensembling through model-based data generation
- URL: http://arxiv.org/abs/2109.13672v1
- Date: Tue, 28 Sep 2021 12:44:10 GMT
- Title: Improved prediction rule ensembling through model-based data generation
- Authors: Benny Markovitch, Marjolein Fokkema
- Abstract summary: Prediction rule ensembles (PRE) provide interpretable prediction models with relatively high accuracy.
PRE obtain a large set of decision rules from a (boosted) decision tree ensemble, and achieves sparsitythrough application of Lasso-penalized regression.
This article examines the use of surrogate modelsto improve performance of PRE, wherein the Lasso regression is trained with the help of a massivedataset.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Prediction rule ensembles (PRE) provide interpretable prediction models with
relatively high accuracy.PRE obtain a large set of decision rules from a
(boosted) decision tree ensemble, and achieves sparsitythrough application of
Lasso-penalized regression. This article examines the use of surrogate modelsto
improve performance of PRE, wherein the Lasso regression is trained with the
help of a massivedataset generated by the (boosted) decision tree ensemble.
This use of model-based data generationmay improve the stability and
consistency of the Lasso step, thus leading to improved overallperformance. We
propose two surrogacy approaches, and evaluate them on simulated and
existingdatasets, in terms of sparsity and predictive accuracy. The results
indicate that the use of surrogacymodels can substantially improve the sparsity
of PRE, while retaining predictive accuracy, especiallythrough the use of a
nested surrogacy approach.
Related papers
- Optimizing Sequential Recommendation Models with Scaling Laws and Approximate Entropy [104.48511402784763]
Performance Law for SR models aims to theoretically investigate and model the relationship between model performance and data quality.
We propose Approximate Entropy (ApEn) to assess data quality, presenting a more nuanced approach compared to traditional data quantity metrics.
arXiv Detail & Related papers (2024-11-30T10:56:30Z) - Towards Stable Machine Learning Model Retraining via Slowly Varying Sequences [6.067007470552307]
We propose a model-agnostic framework for finding sequences of models that are stable across retraining iterations.
We develop a mixed-integer optimization formulation that is guaranteed to recover optimal models.
We find that, on average, a 2% reduction in predictive power leads to a 30% improvement in stability.
arXiv Detail & Related papers (2024-03-28T22:45:38Z) - Predictive Churn with the Set of Good Models [64.05949860750235]
We study the effect of conflicting predictions over the set of near-optimal machine learning models.
We present theoretical results on the expected churn between models within the Rashomon set.
We show how our approach can be used to better anticipate, reduce, and avoid churn in consumer-facing applications.
arXiv Detail & Related papers (2024-02-12T16:15:25Z) - Beyond mirkwood: Enhancing SED Modeling with Conformal Predictions [0.0]
We propose an advanced machine learning-based approach that enhances flexibility and uncertainty in SED fitting.
We incorporate conformalized quantile regression to convert point predictions into error bars, enhancing interpretability and reliability.
arXiv Detail & Related papers (2023-12-21T11:27:20Z) - Learning Sample Difficulty from Pre-trained Models for Reliable
Prediction [55.77136037458667]
We propose to utilize large-scale pre-trained models to guide downstream model training with sample difficulty-aware entropy regularization.
We simultaneously improve accuracy and uncertainty calibration across challenging benchmarks.
arXiv Detail & Related papers (2023-04-20T07:29:23Z) - Prediction-Oriented Bayesian Active Learning [51.426960808684655]
Expected predictive information gain (EPIG) is an acquisition function that measures information gain in the space of predictions rather than parameters.
EPIG leads to stronger predictive performance compared with BALD across a range of datasets and models.
arXiv Detail & Related papers (2023-04-17T10:59:57Z) - Improving Adaptive Conformal Prediction Using Self-Supervised Learning [72.2614468437919]
We train an auxiliary model with a self-supervised pretext task on top of an existing predictive model and use the self-supervised error as an additional feature to estimate nonconformity scores.
We empirically demonstrate the benefit of the additional information using both synthetic and real data on the efficiency (width), deficit, and excess of conformal prediction intervals.
arXiv Detail & Related papers (2023-02-23T18:57:14Z) - Robust self-healing prediction model for high dimensional data [0.685316573653194]
This work proposes a robust self healing (RSH) hybrid prediction model.
It functions by using the data in its entirety by removing errors and inconsistencies from it rather than discarding any data.
The proposed method is compared with some of the existing high performing models and the results are analyzed.
arXiv Detail & Related papers (2022-10-04T17:55:50Z) - Explainable boosted linear regression for time series forecasting [0.1876920697241348]
Time series forecasting involves collecting and analyzing past observations to develop a model to extrapolate such observations into the future.
We propose explainable boosted linear regression (EBLR) algorithm for time series forecasting.
arXiv Detail & Related papers (2020-09-18T22:31:42Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.