Comparison of tree-based ensemble algorithms for merging satellite and
earth-observed precipitation data at the daily time scale
- URL: http://arxiv.org/abs/2301.01214v1
- Date: Sat, 31 Dec 2022 11:14:45 GMT
- Title: Comparison of tree-based ensemble algorithms for merging satellite and
earth-observed precipitation data at the daily time scale
- Authors: Georgia Papacharalampous, Hristos Tyralis, Anastasios Doulamis,
Nikolaos Doulamis
- Abstract summary: Merging satellite products and ground-based measurements is often required for obtaining precipitation datasets that simultaneously cover large regions with high density.
Machine and statistical learning regression algorithms are regularly utilized in this endeavour.
Tree-based ensemble algorithms for regression are adopted in various fields for solving algorithmic problems with high accuracy and low computational cost.
- Score: 7.434517639563671
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Merging satellite products and ground-based measurements is often required
for obtaining precipitation datasets that simultaneously cover large regions
with high density and are more accurate than pure satellite precipitation
products. Machine and statistical learning regression algorithms are regularly
utilized in this endeavour. At the same time, tree-based ensemble algorithms
for regression are adopted in various fields for solving algorithmic problems
with high accuracy and low computational cost. The latter can constitute a
crucial factor for selecting algorithms for satellite precipitation product
correction at the daily and finer time scales, where the size of the datasets
is particularly large. Still, information on which tree-based ensemble
algorithm to select in such a case for the contiguous United States (US) is
missing from the literature. In this work, we conduct an extensive comparison
between three tree-based ensemble algorithms, specifically random forests,
gradient boosting machines (gbm) and extreme gradient boosting (XGBoost), in
the context of interest. We use daily data from the PERSIANN (Precipitation
Estimation from Remotely Sensed Information using Artificial Neural Networks)
and the IMERG (Integrated Multi-satellitE Retrievals for GPM) gridded datasets.
We also use earth-observed precipitation data from the Global Historical
Climatology Network daily (GHCNd) database. The experiments refer to the entire
contiguous US and additionally include the application of the linear regression
algorithm for benchmarking purposes. The results suggest that XGBoost is the
best-performing tree-based ensemble algorithm among those compared. They also
suggest that IMERG is more useful than PERSIANN in the context investigated.
Related papers
- LiteSearch: Efficacious Tree Search for LLM [70.29796112457662]
This study introduces a novel guided tree search algorithm with dynamic node selection and node-level exploration budget.
Experiments conducted on the GSM8K and TabMWP datasets demonstrate that our approach enjoys significantly lower computational costs compared to baseline methods.
arXiv Detail & Related papers (2024-06-29T05:14:04Z) - Ensemble learning for blending gridded satellite and gauge-measured
precipitation data [4.2193475197905705]
This study proposes 11 new ensemble learners for improving the accuracy of satellite precipitation products.
We apply the ensemble learners to monthly data from the PERSIANN and IMERG gridded datasets.
We also use gauge-measured precipitation data from the Global Historical Climatology Network monthly database.
arXiv Detail & Related papers (2023-07-09T17:54:46Z) - Sparsity May Cry: Let Us Fail (Current) Sparse Neural Networks Together! [100.19080749267316]
"Sparsity May Cry" Benchmark (SMC-Bench) is a collection of carefully-curated 4 diverse tasks with 10 datasets.
SMC-Bench is designed to favor and encourage the development of more scalable and generalizable sparse algorithms.
arXiv Detail & Related papers (2023-03-03T18:47:21Z) - GBMST: An Efficient Minimum Spanning Tree Clustering Based on
Granular-Ball Computing [78.92205914422925]
We propose a clustering algorithm that combines multi-granularity Granular-Ball and minimum spanning tree (MST)
We construct coarsegrained granular-balls, and then use granular-balls and MST to implement the clustering method based on "large-scale priority"
Experimental results on several data sets demonstrate the power of the algorithm.
arXiv Detail & Related papers (2023-03-02T09:04:35Z) - Comparison of machine learning algorithms for merging gridded satellite
and earth-observed precipitation data [7.434517639563671]
We use monthly earth-observed precipitation data from the Global Historical Climatology Network monthly database, version 2.
Results suggest that extreme gradient boosting and random forests are the most accurate in terms of the squared error scoring function.
arXiv Detail & Related papers (2022-12-17T09:39:39Z) - SETAR-Tree: A Novel and Accurate Tree Algorithm for Global Time Series
Forecasting [7.206754802573034]
In this paper, we explore the close connections between TAR models and regression trees.
We introduce a new forecasting-specific tree algorithm that trains global Pooled Regression (PR) models in the leaves.
In our evaluation, the proposed tree and forest models are able to achieve significantly higher accuracy than a set of state-of-the-art tree-based algorithms.
arXiv Detail & Related papers (2022-11-16T04:30:42Z) - Satellite Image Time Series Analysis for Big Earth Observation Data [50.591267188664666]
This paper describes sits, an open-source R package for satellite image time series analysis using machine learning.
We show that this approach produces high accuracy for land use and land cover maps through a case study in the Cerrado biome.
arXiv Detail & Related papers (2022-04-24T15:23:25Z) - Physics Informed Shallow Machine Learning for Wind Speed Prediction [66.05661813632568]
We analyze a massive dataset of wind measured from anemometers located at 10 m height in 32 locations in Italy.
We train supervised learning algorithms using the past history of wind to predict its value at a future time.
We find that the optimal design as well as its performance vary with the location.
arXiv Detail & Related papers (2022-04-01T14:55:10Z) - Towards Optimally Efficient Tree Search with Deep Learning [76.64632985696237]
This paper investigates the classical integer least-squares problem which estimates signals integer from linear models.
The problem is NP-hard and often arises in diverse applications such as signal processing, bioinformatics, communications and machine learning.
We propose a general hyper-accelerated tree search (HATS) algorithm by employing a deep neural network to estimate the optimal estimation for the underlying simplified memory-bounded A* algorithm.
arXiv Detail & Related papers (2021-01-07T08:00:02Z) - Clustering with Fast, Automated and Reproducible assessment applied to
longitudinal neural tracking [3.817161834189992]
C-FAR is a novel method for Fast, Automated and Reproducible assessment of hierarchical clustering algorithms simultaneously.
Our algorithm takes any number of hierarchical clustering trees as input, then strategically queries pairs for human feedback, and outputs an optimal clustering among those nominated by these trees.
Our flagship application is the cluster aggregation step in spike-sorting, the task of assigning waveforms (spikes) in recordings to neurons.
arXiv Detail & Related papers (2020-03-19T01:33:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.