Ensemble learning for predictive uncertainty estimation with application to the correction of satellite precipitation products
- URL: http://arxiv.org/abs/2403.10567v2
- Date: Mon, 06 Jan 2025 19:20:27 GMT
- Title: Ensemble learning for predictive uncertainty estimation with application to the correction of satellite precipitation products
- Authors: Georgia Papacharalampous, Hristos Tyralis, Nikolaos Doulamis, Anastasios Doulamis,
- Abstract summary: We introduce nine quantile-based ensemble learners and present the first application of these learners to large precipitation datasets.
We employ a novel feature engineering strategy, reducing predictors to distance-weighted satellite precipitation at relevant locations, combined with location elevation.
Ensemble learning with QR and QRNN yielded the best results across quantile levels ranging from 0.025 to 0.975, outperforming the reference method by 3.91% to 8.95%.
- Score: 3.8623569699070353
- License:
- Abstract: Predictions in the form of probability distributions are crucial for effective decision-making. Quantile regression enables such predictions within spatial prediction settings that aim to create improved precipitation datasets by merging remote sensing and gauge data. However, ensemble learning of quantile regression algorithms remains unexplored in this context and, at the same time, it has not been substantially developed so far in the broader machine learning research landscape. Here, we introduce nine quantile-based ensemble learners and address the afore-mentioned gap in precipitation dataset creation by presenting the first application of these learners to large precipitation datasets. We employed a novel feature engineering strategy, reducing predictors to distance-weighted satellite precipitation at relevant locations, combined with location elevation. Our ensemble learners include six ensemble learning and three simple methods (mean, median, best combiner), combining six individual algorithms: quantile regression (QR), quantile regression forests (QRF), generalized random forests (GRF), gradient boosting machines (GBM), light gradient boosting machines (LightGBM), and quantile regression neural networks (QRNN). These algorithms serve as both base learners and combiners within different ensemble learning methods. We evaluated performance against a reference method (QR) using quantile scoring functions in a large dataset comprising 15 years of monthly gauge-measured and satellite precipitation in the contiguous United States (CONUS). Ensemble learning with QR and QRNN yielded the best results across quantile levels ranging from 0.025 to 0.975, outperforming the reference method by 3.91% to 8.95%. This demonstrates the potential of ensemble learning to improve probabilistic spatial predictions.
Related papers
- Neural Conformal Control for Time Series Forecasting [54.96087475179419]
We introduce a neural network conformal prediction method for time series that enhances adaptivity in non-stationary environments.
Our approach acts as a neural controller designed to achieve desired target coverage, leveraging auxiliary multi-view data with neural network encoders.
We empirically demonstrate significant improvements in coverage and probabilistic accuracy, and find that our method is the only one that combines good calibration with consistency in prediction intervals.
arXiv Detail & Related papers (2024-12-24T03:56:25Z) - Amortized Bayesian Local Interpolation NetworK: Fast covariance parameter estimation for Gaussian Processes [0.04660328753262073]
We propose an Amortized Bayesian Local Interpolation NetworK for fast covariance parameter estimation.
The fast prediction time of these networks allows us to bypass the matrix inversion step, creating large computational speedups.
We show significant increases in computational efficiency over comparable scalable GP methodology.
arXiv Detail & Related papers (2024-11-10T01:26:16Z) - Semiparametric conformal prediction [79.6147286161434]
Risk-sensitive applications require well-calibrated prediction sets over multiple, potentially correlated target variables.
We treat the scores as random vectors and aim to construct the prediction set accounting for their joint correlation structure.
We report desired coverage and competitive efficiency on a range of real-world regression problems.
arXiv Detail & Related papers (2024-11-04T14:29:02Z) - Combinations of distributional regression algorithms with application in uncertainty estimation of corrected satellite precipitation products [3.8623569699070353]
We introduce the concept of distributional regression in precipitation dataset creation.
New ensemble learning methods can be valuable not only for spatial prediction but also for other prediction problems.
Stacking was shown to be superior to individual methods at most quantile levels when evaluated with the quantile loss function.
arXiv Detail & Related papers (2024-06-29T05:58:00Z) - Minimally Supervised Learning using Topological Projections in
Self-Organizing Maps [55.31182147885694]
We introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs)
Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU)
Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques.
arXiv Detail & Related papers (2024-01-12T22:51:48Z) - Uncertainty estimation of machine learning spatial precipitation predictions from satellite data [3.8623569699070353]
Merging satellite and gauge data with machine learning produces high-resolution precipitation datasets.
We address the gap of how to optimally provide such estimates by benchmarking six algorithms.
We propose a suite of machine learning algorithms for estimating uncertainty in spatial data prediction.
arXiv Detail & Related papers (2023-11-13T17:55:28Z) - Tackling Computational Heterogeneity in FL: A Few Theoretical Insights [68.8204255655161]
We introduce and analyse a novel aggregation framework that allows for formalizing and tackling computational heterogeneous data.
Proposed aggregation algorithms are extensively analyzed from a theoretical, and an experimental prospective.
arXiv Detail & Related papers (2023-07-12T16:28:21Z) - Ensemble learning for blending gridded satellite and gauge-measured
precipitation data [4.2193475197905705]
This study proposes 11 new ensemble learners for improving the accuracy of satellite precipitation products.
We apply the ensemble learners to monthly data from the PERSIANN and IMERG gridded datasets.
We also use gauge-measured precipitation data from the Global Historical Climatology Network monthly database.
arXiv Detail & Related papers (2023-07-09T17:54:46Z) - Comparison of machine learning algorithms for merging gridded satellite
and earth-observed precipitation data [7.434517639563671]
We use monthly earth-observed precipitation data from the Global Historical Climatology Network monthly database, version 2.
Results suggest that extreme gradient boosting and random forests are the most accurate in terms of the squared error scoring function.
arXiv Detail & Related papers (2022-12-17T09:39:39Z) - Flexible Model Aggregation for Quantile Regression [92.63075261170302]
Quantile regression is a fundamental problem in statistical learning motivated by a need to quantify uncertainty in predictions.
We investigate methods for aggregating any number of conditional quantile models.
All of the models we consider in this paper can be fit using modern deep learning toolkits.
arXiv Detail & Related papers (2021-02-26T23:21:16Z) - Solving Mixed Integer Programs Using Neural Networks [57.683491412480635]
This paper applies learning to the two key sub-tasks of a MIP solver, generating a high-quality joint variable assignment, and bounding the gap in objective value between that assignment and an optimal one.
Our approach constructs two corresponding neural network-based components, Neural Diving and Neural Branching, to use in a base MIP solver such as SCIP.
We evaluate our approach on six diverse real-world datasets, including two Google production datasets and MIPLIB, by training separate neural networks on each.
arXiv Detail & Related papers (2020-12-23T09:33:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.