Foundation for unbiased cross-validation of spatio-temporal models for species distribution modeling
- URL: http://arxiv.org/abs/2502.03480v1
- Date: Mon, 27 Jan 2025 23:02:05 GMT
- Title: Foundation for unbiased cross-validation of spatio-temporal models for species distribution modeling
- Authors: Diana Koldasbayeva, Alexey Zaytsev,
- Abstract summary: Species Distribution Models (SDMs) often suffer from spatial autocorrelation (SAC), leading to biased performance estimates.<n>We tested cross-CV strategies - random splits, spatial blocking with varied schemes, environmental schemes, and a novel-temporal method.
- Score: 2.6862667248315386
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Species Distribution Models (SDMs) often suffer from spatial autocorrelation (SAC), leading to biased performance estimates. We tested cross-validation (CV) strategies - random splits, spatial blocking with varied distances, environmental (ENV) clustering, and a novel spatio-temporal method - under two proposed training schemes: LAST FOLD, widely used in spatial CV at the cost of data loss, and RETRAIN, which maximizes data usage but risks reintroducing SAC. LAST FOLD consistently yielded lower errors and stronger correlations. Spatial blocking at an optimal distance (SP 422) and ENV performed best, achieving Spearman and Pearson correlations of 0.485 and 0.548, respectively, although ENV may be unsuitable for long-term forecasts involving major environmental shifts. A spatio-temporal approach yielded modest benefits in our moderately variable dataset, but may excel with stronger temporal changes. These findings highlight the need to align CV approaches with the spatial and temporal structure of SDM data, ensuring rigorous validation and reliable predictive outcomes.
Related papers
- Weather-Related Crash Risk Forecasting: A Deep Learning Approach for Heterogenous Spatiotemporal Data [0.0]
This study introduces a deep learning-based framework for forecasting weather-related traffic crash risk using heterogeneous road data.<n>North Carolina was selected as the study area due to its diverse weather conditions, with historical crash, weather, and traffic data aggregated at 5-mi by 5-mi grid resolution.
arXiv Detail & Related papers (2026-03-04T19:35:10Z) - Echo State Networks for Time Series Forecasting: Hyperparameter Sweep and Benchmarking [51.56484100374058]
We evaluate whether a fully automatic, purely feedback-driven ESN can serve as a competitive alternative to widely used statistical forecasting methods.<n>Forecast accuracy is measured using MASE and sMAPE and benchmarked against simple benchmarks like drift and seasonal naive and statistical models.
arXiv Detail & Related papers (2026-02-03T16:01:22Z) - Optimal Look-back Horizon for Time Series Forecasting in Federated Learning [26.070107882914844]
This paper presents a principled framework for adaptive horizon selection in federated time series forecasting.<n>We derive a decomposition of the forecasting loss into a Bayesian term, which reflects irreducible uncertainty.<n>We prove that the total forecasting loss is minimized at the smallest horizon where the irreducible loss starts to saturate, while the approximation loss continues to rise.
arXiv Detail & Related papers (2025-11-16T21:46:54Z) - How Different from the Past? Spatio-Temporal Time Series Forecasting with Self-Supervised Deviation Learning [15.102926671713668]
We propose ST-SSDL, a Spatio-Temporal series time forecasting framework.<n>It discretizes latent space using learnable prototypes that represent typicaltemporal patterns.<n>Experiments show that ST-SSDL consistently outperforms state-of-the-art baselines across multiple metrics.
arXiv Detail & Related papers (2025-10-06T15:21:13Z) - A theoretical framework for self-supervised contrastive learning for continuous dependent data [86.50780641055258]
Self-supervised learning (SSL) has emerged as a powerful approach to learning representations, particularly in the field of computer vision.<n>We propose a novel theoretical framework for contrastive SSL tailored to emphsemantic independence between samples.<n>Specifically, we outperform TS2Vec on the standard UEA and UCR benchmarks, with accuracy improvements of $4.17$% and $2.08$%, respectively.
arXiv Detail & Related papers (2025-06-11T14:23:47Z) - Adaptive Deadline and Batch Layered Synchronized Federated Learning [66.93447103966439]
Federated learning (FL) enables collaborative model training across distributed edge devices while preserving data privacy, and typically operates in a round-based synchronous manner.<n>We propose ADEL-FL, a novel framework that jointly optimize per-round deadlines and user-specific batch sizes for layer-wise aggregation.
arXiv Detail & Related papers (2025-05-29T19:59:18Z) - Leveraging Multivariate Long-Term History Representation for Time Series Forecasting [6.661358934189792]
We propose a framework called Long-term Multivariate Representation (LMHR) for MTS forecasting.<n>LMHR encodes the long-term history into segment-level contextual representations and reduces point-level noise.<n>It consistently improves prediction accuracy by 9.8% on the top 10% of rapidly changing patterns.
arXiv Detail & Related papers (2025-05-20T03:46:36Z) - Inverse Reinforcement Learning for Minimum-Exposure Paths in Spatiotemporally Varying Scalar Fields [49.1574468325115]
We consider a problem of synthesizing datasets of minimum exposure paths that resemble a training dataset of such paths.
The main contribution of this paper is an inverse reinforcement learning (IRL) model to solve this problem.
We find that the proposed IRL model provides excellent performance in synthesizing paths from initial conditions not seen in the training dataset.
arXiv Detail & Related papers (2025-03-09T13:30:11Z) - Spatiotemporal Forecasting in Climate Data Using EOFs and Machine Learning Models: A Case Study in Chile [0.0]
This study employs an innovative and efficient hybrid methodology that integrates machine learning (ML) methods for time series forecasting with established statistical techniques.
The methodology is applied to a grid of climate data covering the territory of Chile.
arXiv Detail & Related papers (2025-02-21T01:34:38Z) - Cross Space and Time: A Spatio-Temporal Unitized Model for Traffic Flow Forecasting [16.782154479264126]
Predicting backbone-temporal traffic flow presents challenges due to complex interactions between temporal factors.
Existing approaches address these dimensions in isolation, neglecting their critical interdependencies.
In this paper, we introduce Sanonymous-Temporal Unitized Unitized Cell (ASTUC), a unified framework designed to capture both spatial and temporal dependencies.
arXiv Detail & Related papers (2024-11-14T07:34:31Z) - Risk and cross validation in ridge regression with correlated samples [72.59731158970894]
We provide training examples for the in- and out-of-sample risks of ridge regression when the data points have arbitrary correlations.<n>We demonstrate that in this setting, the generalized cross validation estimator (GCV) fails to correctly predict the out-of-sample risk.<n>We further extend our analysis to the case where the test point has nontrivial correlations with the training set, a setting often encountered in time series forecasting.
arXiv Detail & Related papers (2024-08-08T17:27:29Z) - PeFAD: A Parameter-Efficient Federated Framework for Time Series Anomaly Detection [51.20479454379662]
We propose a.
Federated Anomaly Detection framework named PeFAD with the increasing privacy concerns.
We conduct extensive evaluations on four real datasets, where PeFAD outperforms existing state-of-the-art baselines by up to 28.74%.
arXiv Detail & Related papers (2024-06-04T13:51:08Z) - Geometry-Aware Instrumental Variable Regression [56.16884466478886]
We propose a transport-based IV estimator that takes into account the geometry of the data manifold through data-derivative information.
We provide a simple plug-and-play implementation of our method that performs on par with related estimators in standard settings.
arXiv Detail & Related papers (2024-05-19T17:49:33Z) - Supervised Contrastive Learning based Dual-Mixer Model for Remaining
Useful Life Prediction [3.081898819471624]
The Remaining Useful Life (RUL) prediction aims at providing an accurate estimate of the remaining time from the current predicting moment to the complete failure of the device.
To overcome the shortcomings of rigid combination for temporal and spatial features in most existing RUL prediction approaches, a spatial-temporal homogeneous feature extractor, named Dual-Mixer model, is proposed.
The effectiveness of the proposed method is validated through comparisons with other latest research works on the C-MAPSS dataset.
arXiv Detail & Related papers (2024-01-29T14:38:44Z) - Joint model for longitudinal and spatio-temporal survival data [3.8448145915428644]
We propose the Spatio-Nested Joint Model (STJM) to capture spatial and temporal effects and their interaction.
We apply the STJM to predict the time to full prepayment on a large dataset of 57,258, US mortgage borrowers with more than 2.5 million observations.
arXiv Detail & Related papers (2023-11-07T14:05:14Z) - Long-term drought prediction using deep neural networks based on geospatial weather data [75.38539438000072]
High-quality drought forecasting up to a year in advance is critical for agriculture planning and insurance.
We tackle drought data by introducing an end-to-end approach that adopts a systematic end-to-end approach.
Key findings are the exceptional performance of a Transformer model, EarthFormer, in making accurate short-term (up to six months) forecasts.
arXiv Detail & Related papers (2023-09-12T13:28:06Z) - sasdim: self-adaptive noise scaling diffusion model for spatial time
series imputation [22.881248410404126]
We propose a self-adaptive noise scaling diffusion model named SaSDim to perform spatial time series imputation.
Specially, we propose a new loss function that can scale the noise to the similar intensity, and propose the across spatial-temporal global convolution module.
arXiv Detail & Related papers (2023-09-05T06:51:39Z) - Generative Time Series Forecasting with Diffusion, Denoise, and
Disentanglement [51.55157852647306]
Time series forecasting has been a widely explored task of great importance in many applications.
It is common that real-world time series data are recorded in a short time period, which results in a big gap between the deep model and the limited and noisy time series.
We propose to address the time series forecasting problem with generative modeling and propose a bidirectional variational auto-encoder equipped with diffusion, denoise, and disentanglement.
arXiv Detail & Related papers (2023-01-08T12:20:46Z) - Estimating the Prediction Performance of Spatial Models via Spatial
k-Fold Cross Validation [1.7205106391379026]
In machine learning one often assumes the data are independent when evaluating model performance.
spatial autocorrelation (SAC) causes the standard cross validation (CV) methods to produce optimistically biased prediction performance estimates.
We propose a modified version of the CV method called spatial k-fold cross validation (SKCV) which provides a useful estimate for model prediction performance without optimistic bias due to SAC.
arXiv Detail & Related papers (2020-05-28T19:55:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.