Related papers: Comparison of machine learning algorithms for merging gridded satellite and earth-observed precipitation data

Comparison of machine learning algorithms for merging gridded satellite and earth-observed precipitation data

URL: http://arxiv.org/abs/2301.01252v1
Date: Sat, 17 Dec 2022 09:39:39 GMT
Title: Comparison of machine learning algorithms for merging gridded satellite and earth-observed precipitation data
Authors: Georgia Papacharalampous, Hristos Tyralis, Anastasios Doulamis, Nikolaos Doulamis
Abstract summary: We use monthly earth-observed precipitation data from the Global Historical Climatology Network monthly database, version 2. Results suggest that extreme gradient boosting and random forests are the most accurate in terms of the squared error scoring function.
Score: 7.434517639563671
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Gridded satellite precipitation datasets are useful in hydrological applications as they cover large regions with high density. However, they are not accurate in the sense that they do not agree with ground-based measurements. An established means for improving their accuracy is to correct them by adopting machine learning algorithms. The problem is defined as a regression setting, in which the ground-based measurements have the role of the dependent variable and the satellite data are the predictor variables, together with topography factors (e.g., elevation). Most studies of this kind involve a limited number of machine learning algorithms, and are conducted at a small region and for a limited time period. Thus, the results obtained through them are of local importance and do not provide more general guidance and best practices. To provide results that are generalizable and to contribute to the delivery of best practices, we here compare eight state-of-the-art machine learning algorithms in correcting satellite precipitation data for the entire contiguous United States and for a 15-year period. We use monthly data from the PERSIANN (Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks) gridded dataset, together with monthly earth-observed precipitation data from the Global Historical Climatology Network monthly database, version 2 (GHCNm). The results suggest that extreme gradient boosting (XGBoost) and random forests are the most accurate in terms of the squared error scoring function. The remaining algorithms can be ordered as follows from the best to the worst ones: Bayesian regularized feed-forward neural networks, multivariate adaptive polynomial splines (poly-MARS), gradient boosting machines (gbm), multivariate adaptive regression splines (MARS), feed-forward neural networks and linear regression.

Related papers

ReGUIDE: Data Efficient GUI Grounding via Spatial Reasoning and Search [53.40810298627443]
ReGUIDE is a framework for web grounding that enables MLLMs to learn data efficiently through self-generated reasoning and spatial-aware criticism.<n>Our experiments demonstrate that ReGUIDE significantly advances web grounding performance across multiple benchmarks.
arXiv Detail & Related papers (2025-05-21T08:36:18Z)
Ensemble learning for uncertainty estimation with application to the correction of satellite precipitation products [3.8623569699070353]
We introduce nine quantile-based ensemble learners and address the gap in precipitation dataset creation.<n>We employ a novel feature engineering strategy, which reduces the number of predictors by using distance-weighted satellite precipitation at relevant locations.<n> Ensemble learning with QR and QRNN yielded the best results across the various investigated quantile levels, which range from 0.025 to 0.975, outperforming the reference method by 3.91% to 8.95%.
arXiv Detail & Related papers (2024-03-14T17:45:56Z)
Minimally Supervised Learning using Topological Projections in Self-Organizing Maps [55.31182147885694]
We introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs) Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU) Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques.
arXiv Detail & Related papers (2024-01-12T22:51:48Z)
Long-term drought prediction using deep neural networks based on geospatial weather data [75.38539438000072]
High-quality drought forecasting up to a year in advance is critical for agriculture planning and insurance. We tackle drought data by introducing an end-to-end approach that adopts a systematic end-to-end approach. Key findings are the exceptional performance of a Transformer model, EarthFormer, in making accurate short-term (up to six months) forecasts.
arXiv Detail & Related papers (2023-09-12T13:28:06Z)
Ensemble learning for blending gridded satellite and gauge-measured precipitation data [4.2193475197905705]
This study proposes 11 new ensemble learners for improving the accuracy of satellite precipitation products. We apply the ensemble learners to monthly data from the PERSIANN and IMERG gridded datasets. We also use gauge-measured precipitation data from the Global Historical Climatology Network monthly database.
arXiv Detail & Related papers (2023-07-09T17:54:46Z)
Comparison of tree-based ensemble algorithms for merging satellite and earth-observed precipitation data at the daily time scale [7.434517639563671]
Merging satellite products and ground-based measurements is often required for obtaining precipitation datasets that simultaneously cover large regions with high density. Machine and statistical learning regression algorithms are regularly utilized in this endeavour. Tree-based ensemble algorithms for regression are adopted in various fields for solving algorithmic problems with high accuracy and low computational cost.
arXiv Detail & Related papers (2022-12-31T11:14:45Z)
Simple and Effective Augmentation Methods for CSI Based Indoor Localization [37.3026733673066]
We propose two algorithms for channel state information based indoor localization motivated by physical considerations. As little as 10% of the original dataset size is enough to get the same performance as the original dataset. If we further augment the dataset with the proposed techniques, test accuracy is improved more than three-fold.
arXiv Detail & Related papers (2022-11-19T20:27:46Z)
Adaptive Self-supervision Algorithms for Physics-informed Neural Networks [59.822151945132525]
Physics-informed neural networks (PINNs) incorporate physical knowledge from the problem domain as a soft constraint on the loss function. We study the impact of the location of the collocation points on the trainability of these models. We propose a novel adaptive collocation scheme which progressively allocates more collocation points to areas where the model is making higher errors.
arXiv Detail & Related papers (2022-07-08T18:17:06Z)
On the modern deep learning approaches for precipitation downscaling [0.0]
We carry out the DL-based downscaling to estimate the local precipitation data from the India Meteorological Department (IMD) To test the efficacy of different DL approaches, we apply four different methods of downscaling and evaluate their performance. The results indicate that SR-GAN is the best method for precipitation data downscaling.
arXiv Detail & Related papers (2022-07-02T11:57:39Z)
Physics Informed Shallow Machine Learning for Wind Speed Prediction [66.05661813632568]
We analyze a massive dataset of wind measured from anemometers located at 10 m height in 32 locations in Italy. We train supervised learning algorithms using the past history of wind to predict its value at a future time. We find that the optimal design as well as its performance vary with the location.
arXiv Detail & Related papers (2022-04-01T14:55:10Z)
Convolutional generative adversarial imputation networks for spatio-temporal missing data in storm surge simulations [86.5302150777089]
Generative Adversarial Imputation Nets (GANs) and GAN-based techniques have attracted attention as unsupervised machine learning methods. We name our proposed method as Con Conval Generative Adversarial Imputation Nets (Conv-GAIN)
arXiv Detail & Related papers (2021-11-03T03:50:48Z)
Learning to Detect Fortified Areas [0.0]
We consider the problem of classifying which areas of a given surface are fortified by for instance, roads, sidewalks, parking spaces, paved driveways and terraces. We propose an algorithmic solution by designing a neural net embedding architecture that transforms data from all the different sensor systems into a new common representation.
arXiv Detail & Related papers (2021-05-26T08:03:42Z)
Real-Time Regression with Dividing Local Gaussian Processes [62.01822866877782]
Local Gaussian processes are a novel, computationally efficient modeling approach based on Gaussian process regression. Due to an iterative, data-driven division of the input space, they achieve a sublinear computational complexity in the total number of training points in practice. A numerical evaluation on real-world data sets shows their advantages over other state-of-the-art methods in terms of accuracy as well as prediction and update speed.
arXiv Detail & Related papers (2020-06-16T18:43:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.