Related papers: Comparison of tree-based ensemble algorithms for merging satellite and earth-observed precipitation data at the daily time scale

Comparison of tree-based ensemble algorithms for merging satellite and earth-observed precipitation data at the daily time scale

URL: http://arxiv.org/abs/2301.01214v1
Date: Sat, 31 Dec 2022 11:14:45 GMT
Title: Comparison of tree-based ensemble algorithms for merging satellite and earth-observed precipitation data at the daily time scale
Authors: Georgia Papacharalampous, Hristos Tyralis, Anastasios Doulamis, Nikolaos Doulamis
Abstract summary: Merging satellite products and ground-based measurements is often required for obtaining precipitation datasets that simultaneously cover large regions with high density. Machine and statistical learning regression algorithms are regularly utilized in this endeavour. Tree-based ensemble algorithms for regression are adopted in various fields for solving algorithmic problems with high accuracy and low computational cost.
Score: 7.434517639563671
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Merging satellite products and ground-based measurements is often required for obtaining precipitation datasets that simultaneously cover large regions with high density and are more accurate than pure satellite precipitation products. Machine and statistical learning regression algorithms are regularly utilized in this endeavour. At the same time, tree-based ensemble algorithms for regression are adopted in various fields for solving algorithmic problems with high accuracy and low computational cost. The latter can constitute a crucial factor for selecting algorithms for satellite precipitation product correction at the daily and finer time scales, where the size of the datasets is particularly large. Still, information on which tree-based ensemble algorithm to select in such a case for the contiguous United States (US) is missing from the literature. In this work, we conduct an extensive comparison between three tree-based ensemble algorithms, specifically random forests, gradient boosting machines (gbm) and extreme gradient boosting (XGBoost), in the context of interest. We use daily data from the PERSIANN (Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks) and the IMERG (Integrated Multi-satellitE Retrievals for GPM) gridded datasets. We also use earth-observed precipitation data from the Global Historical Climatology Network daily (GHCNd) database. The experiments refer to the entire contiguous US and additionally include the application of the linear regression algorithm for benchmarking purposes. The results suggest that XGBoost is the best-performing tree-based ensemble algorithm among those compared. They also suggest that IMERG is more useful than PERSIANN in the context investigated.

Related papers

Don't Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration Pitfalls [83.89771461061903]
Recent advancements in tree search algorithms guided by verifiers have significantly enhanced the reasoning capabilities of large language models (LLMs) Recent advancements in tree search algorithms guided by verifiers have significantly enhanced the reasoning capabilities of large language models (LLMs) We identify two key challenges contributing to this inefficiency: $textitover-exploration$ due to redundant states with semantically equivalent content, and $textitunder-exploration$ caused by high variance in verifier scoring. We propose FETCH, a flexible, plug-and-play system compatible with various tree search algorithms.
arXiv Detail & Related papers (2025-02-16T16:12:01Z)
Score-matching-based Structure Learning for Temporal Data on Networks [17.166362605356074]
Causal discovery is a crucial initial step in establishing causality from empirical data and background knowledge. Current score-matching-based algorithms are primarily designed to analyze independent and identically distributed (i.i.d.) data. We have developed a new parent-finding subroutine for leaf nodes in DAGs, significantly accelerating the most time-consuming part of the process: the pruning step.
arXiv Detail & Related papers (2024-12-10T12:36:35Z)
LiteSearch: Efficacious Tree Search for LLM [70.29796112457662]
This study introduces a novel guided tree search algorithm with dynamic node selection and node-level exploration budget. Experiments conducted on the GSM8K and TabMWP datasets demonstrate that our approach enjoys significantly lower computational costs compared to baseline methods.
arXiv Detail & Related papers (2024-06-29T05:14:04Z)
Ensemble learning for blending gridded satellite and gauge-measured precipitation data [4.2193475197905705]
This study proposes 11 new ensemble learners for improving the accuracy of satellite precipitation products. We apply the ensemble learners to monthly data from the PERSIANN and IMERG gridded datasets. We also use gauge-measured precipitation data from the Global Historical Climatology Network monthly database.
arXiv Detail & Related papers (2023-07-09T17:54:46Z)
Sparsity May Cry: Let Us Fail (Current) Sparse Neural Networks Together! [100.19080749267316]
"Sparsity May Cry" Benchmark (SMC-Bench) is a collection of carefully-curated 4 diverse tasks with 10 datasets. SMC-Bench is designed to favor and encourage the development of more scalable and generalizable sparse algorithms.
arXiv Detail & Related papers (2023-03-03T18:47:21Z)
GBMST: An Efficient Minimum Spanning Tree Clustering Based on Granular-Ball Computing [78.92205914422925]
We propose a clustering algorithm that combines multi-granularity Granular-Ball and minimum spanning tree (MST) We construct coarsegrained granular-balls, and then use granular-balls and MST to implement the clustering method based on "large-scale priority" Experimental results on several data sets demonstrate the power of the algorithm.
arXiv Detail & Related papers (2023-03-02T09:04:35Z)
Comparison of machine learning algorithms for merging gridded satellite and earth-observed precipitation data [7.434517639563671]
We use monthly earth-observed precipitation data from the Global Historical Climatology Network monthly database, version 2. Results suggest that extreme gradient boosting and random forests are the most accurate in terms of the squared error scoring function.
arXiv Detail & Related papers (2022-12-17T09:39:39Z)
SETAR-Tree: A Novel and Accurate Tree Algorithm for Global Time Series Forecasting [7.206754802573034]
In this paper, we explore the close connections between TAR models and regression trees. We introduce a new forecasting-specific tree algorithm that trains global Pooled Regression (PR) models in the leaves. In our evaluation, the proposed tree and forest models are able to achieve significantly higher accuracy than a set of state-of-the-art tree-based algorithms.
arXiv Detail & Related papers (2022-11-16T04:30:42Z)
Satellite Image Time Series Analysis for Big Earth Observation Data [50.591267188664666]
This paper describes sits, an open-source R package for satellite image time series analysis using machine learning. We show that this approach produces high accuracy for land use and land cover maps through a case study in the Cerrado biome.
arXiv Detail & Related papers (2022-04-24T15:23:25Z)
Physics Informed Shallow Machine Learning for Wind Speed Prediction [66.05661813632568]
We analyze a massive dataset of wind measured from anemometers located at 10 m height in 32 locations in Italy. We train supervised learning algorithms using the past history of wind to predict its value at a future time. We find that the optimal design as well as its performance vary with the location.
arXiv Detail & Related papers (2022-04-01T14:55:10Z)
Towards Optimally Efficient Tree Search with Deep Learning [76.64632985696237]
This paper investigates the classical integer least-squares problem which estimates signals integer from linear models. The problem is NP-hard and often arises in diverse applications such as signal processing, bioinformatics, communications and machine learning. We propose a general hyper-accelerated tree search (HATS) algorithm by employing a deep neural network to estimate the optimal estimation for the underlying simplified memory-bounded A* algorithm.
arXiv Detail & Related papers (2021-01-07T08:00:02Z)
Clustering with Fast, Automated and Reproducible assessment applied to longitudinal neural tracking [3.817161834189992]
C-FAR is a novel method for Fast, Automated and Reproducible assessment of hierarchical clustering algorithms simultaneously. Our algorithm takes any number of hierarchical clustering trees as input, then strategically queries pairs for human feedback, and outputs an optimal clustering among those nominated by these trees. Our flagship application is the cluster aggregation step in spike-sorting, the task of assigning waveforms (spikes) in recordings to neurons.
arXiv Detail & Related papers (2020-03-19T01:33:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.