Taking the Garbage Out of Data-Driven Prediction Across Climate Timescales
- URL: http://arxiv.org/abs/2508.07062v1
- Date: Sat, 09 Aug 2025 17:55:38 GMT
- Title: Taking the Garbage Out of Data-Driven Prediction Across Climate Timescales
- Authors: Jason C. Furtado, Maria J. Molina, Marybeth C. Arcodia, Weston Anderson, Tom Beucler, John A. Callahan, Laura M. Ciasto, Vittorio A. Gensini, Michelle L'Heureux, Kathleen Pegion, Jhayron S. Pérez-Carrasquilla, Maike Sonnewald, Ken Takahashi, Baoqiang Xiang, Brian G. Zimmerman,
- Abstract summary: Article establishes protocols for the proper preprocessing of input data for AI/ML models designed for climate prediction.<n>Three aims are to: educate researchers, developers, and end users on the effects that preprocessing has on climate predictions.<n>Ultimately, implementing the recommended practices will enhance the robustness and transparency of AI/ML in climate prediction studies.
- Score: 0.3032942517187112
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Artificial intelligence (AI) -- and specifically machine learning (ML) -- applications for climate prediction across timescales are proliferating quickly. The emergence of these methods prompts a revisit to the impact of data preprocessing, a topic familiar to the climate community, as more traditional statistical models work with relatively small sample sizes. Indeed, the skill and confidence in the forecasts produced by data-driven models are directly influenced by the quality of the datasets and how they are treated during model development, thus yielding the colloquialism "garbage in, garbage out." As such, this article establishes protocols for the proper preprocessing of input data for AI/ML models designed for climate prediction (i.e., subseasonal to decadal and longer). The three aims are to: (1) educate researchers, developers, and end users on the effects that preprocessing has on climate predictions; (2) provide recommended practices for data preprocessing for such applications; and (3) empower end users to decipher whether the models they are using are properly designed for their objectives. Specific topics covered in this article include the creation of (standardized) anomalies, dealing with non-stationarity and the spatiotemporally correlated nature of climate data, and handling of extreme values and variables with potentially complex distributions. Case studies will illustrate how using different preprocessing techniques can produce different predictions from the same model, which can create confusion and decrease confidence in the overall process. Ultimately, implementing the recommended practices set forth in this article will enhance the robustness and transparency of AI/ML in climate prediction studies.
Related papers
- Do machine learning climate models work in changing climate dynamics? [3.912757881247533]
This research systematically evaluates state-of-the-art ML-based climate models in diverse out-of-distribution (OOD) scenarios.<n>Experiments on large-scale datasets reveal notable performance variability across scenarios, shedding light on the strengths and limitations of current models.
arXiv Detail & Related papers (2025-09-15T17:12:49Z) - CERA: A Framework for Improved Generalization of Machine Learning Models to Changed Climates [1.205087107092304]
Robust generalization under climate change remains a major challenge for machine learning applications in climate science.<n>We present CERA (Climate-invariant climate representation through Representation), a machine learning framework consisting of an autoencoder with explicit latent-space alignment.<n>Without training on labeled data from a +4K climate, CERA leverages labeled control-climate data and unlabeled warmer-climate inputs to improve generalization to the warmer climate.
arXiv Detail & Related papers (2025-08-15T20:28:04Z) - Data-driven Seasonal Climate Predictions via Variational Inference and Transformers [31.98107454758077]
We train generative models on climate model output for seasonal predictions.<n>We analyse the method's performance in predicting interannual anomalies beyond the climate change-induced trend.
arXiv Detail & Related papers (2025-03-26T11:51:23Z) - Generative assimilation and prediction for weather and climate [9.319028023682494]
We introduce Generative Assimilation and Prediction (GAP)<n>GAP is a unified framework for assimilation and prediction of both weather and climate.<n>It excels in a broad range of weather-climate related tasks, including data assimilation, seamless prediction, and climate simulation.
arXiv Detail & Related papers (2025-03-04T22:36:29Z) - On conditional diffusion models for PDE simulations [53.01911265639582]
We study score-based diffusion models for forecasting and assimilation of sparse observations.
We propose an autoregressive sampling approach that significantly improves performance in forecasting.
We also propose a new training strategy for conditional score-based models that achieves stable performance over a range of history lengths.
arXiv Detail & Related papers (2024-10-21T18:31:04Z) - ExtremeCast: Boosting Extreme Value Prediction for Global Weather Forecast [57.6987191099507]
We introduce Exloss, a novel loss function that performs asymmetric optimization and highlights extreme values to obtain accurate extreme weather forecast.
We also introduce ExBooster, which captures the uncertainty in prediction outcomes by employing multiple random samples.
Our solution can achieve state-of-the-art performance in extreme weather prediction, while maintaining the overall forecast accuracy comparable to the top medium-range forecast models.
arXiv Detail & Related papers (2024-02-02T10:34:13Z) - Comparing Data-Driven and Mechanistic Models for Predicting Phenology in
Deciduous Broadleaf Forests [47.285748922842444]
We train a deep neural network to predict a phenological index from meteorological time series.
We find that this approach outperforms traditional process-based models.
arXiv Detail & Related papers (2024-01-08T15:29:23Z) - Towards an end-to-end artificial intelligence driven global weather forecasting system [57.5191940978886]
We present an AI-based data assimilation model, i.e., Adas, for global weather variables.
We demonstrate that Adas can assimilate global observations to produce high-quality analysis, enabling the system operate stably for long term.
We are the first to apply the methods to real-world scenarios, which is more challenging and has considerable practical application potential.
arXiv Detail & Related papers (2023-12-18T09:05:28Z) - Predicting Temperature of Major Cities Using Machine Learning and Deep
Learning [0.0]
We use the database made by University of Dayton which consists the change of temperature in major cities to predict the temperature of different cities during any time in future.
This document contains our methodology for being able to make such predictions.
arXiv Detail & Related papers (2023-09-23T10:23:00Z) - SEEDS: Emulation of Weather Forecast Ensembles with Diffusion Models [13.331224394143117]
Uncertainty quantification is crucial to decision-making.
dominant approach to representing uncertainty in weather forecasting is to generate an ensemble of forecasts.
We propose to amortize the computational cost by emulating these forecasts with deep generative diffusion models learned from historical data.
arXiv Detail & Related papers (2023-06-24T22:00:06Z) - ClimaX: A foundation model for weather and climate [51.208269971019504]
ClimaX is a deep learning model for weather and climate science.
It can be pre-trained with a self-supervised learning objective on climate datasets.
It can be fine-tuned to address a breadth of climate and weather tasks.
arXiv Detail & Related papers (2023-01-24T23:19:01Z) - Test-time Collective Prediction [73.74982509510961]
Multiple parties in machine learning want to jointly make predictions on future test points.
Agents wish to benefit from the collective expertise of the full set of agents, but may not be willing to release their data or model parameters.
We explore a decentralized mechanism to make collective predictions at test time, leveraging each agent's pre-trained model.
arXiv Detail & Related papers (2021-06-22T18:29:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.