ClimateSet: A Large-Scale Climate Model Dataset for Machine Learning
- URL: http://arxiv.org/abs/2311.03721v1
- Date: Tue, 7 Nov 2023 04:55:36 GMT
- Title: ClimateSet: A Large-Scale Climate Model Dataset for Machine Learning
- Authors: Julia Kaltenborn, Charlotte E. E. Lange, Venkatesh Ramesh, Philippe
Brouillard, Yaniv Gurwicz, Chandni Nagda, Jakob Runge, Peer Nowack and David
Rolnick
- Abstract summary: Climate models have been key for assessing the impact of climate change and simulating future climate scenarios.
The machine learning (ML) community has taken an increased interest in supporting climate scientists' efforts on various tasks such as climate model emulation, downscaling, and prediction tasks.
Here, we introduce ClimateSet, a dataset containing the inputs and outputs of 36 climate models from the Input4MIPs and CMIP6 archives.
- Score: 26.151056828513962
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Climate models have been key for assessing the impact of climate change and
simulating future climate scenarios. The machine learning (ML) community has
taken an increased interest in supporting climate scientists' efforts on
various tasks such as climate model emulation, downscaling, and prediction
tasks. Many of those tasks have been addressed on datasets created with single
climate models. However, both the climate science and ML communities have
suggested that to address those tasks at scale, we need large, consistent, and
ML-ready climate model datasets. Here, we introduce ClimateSet, a dataset
containing the inputs and outputs of 36 climate models from the Input4MIPs and
CMIP6 archives. In addition, we provide a modular dataset pipeline for
retrieving and preprocessing additional climate models and scenarios. We
showcase the potential of our dataset by using it as a benchmark for ML-based
climate model emulation. We gain new insights about the performance and
generalization capabilities of the different ML models by analyzing their
performance across different climate models. Furthermore, the dataset can be
used to train an ML emulator on several climate models instead of just one.
Such a "super emulator" can quickly project new climate change scenarios,
complementing existing scenarios already provided to policymakers. We believe
ClimateSet will create the basis needed for the ML community to tackle
climate-related tasks at scale.
Related papers
- Modeling chaotic Lorenz ODE System using Scientific Machine Learning [1.4633779950109127]
In this paper, we have integrated Scientific Machine Learning (SciML) methods into foundational weather models.
By combining the interpretability of physical climate models with the computational power of neural networks, SciML models can prove to be a reliable tool for modeling climate.
arXiv Detail & Related papers (2024-10-09T01:17:06Z) - Aurora: A Foundation Model of the Atmosphere [56.97266186291677]
We introduce Aurora, a large-scale foundation model of the atmosphere trained on over a million hours of diverse weather and climate data.
In under a minute, Aurora produces 5-day global air pollution predictions and 10-day high-resolution weather forecasts.
arXiv Detail & Related papers (2024-05-20T14:45:18Z) - Comparing Data-Driven and Mechanistic Models for Predicting Phenology in
Deciduous Broadleaf Forests [47.285748922842444]
We train a deep neural network to predict a phenological index from meteorological time series.
We find that this approach outperforms traditional process-based models.
arXiv Detail & Related papers (2024-01-08T15:29:23Z) - Arabic Mini-ClimateGPT : A Climate Change and Sustainability Tailored
Arabic LLM [77.17254959695218]
Large Language Models (LLMs) like ChatGPT and Bard have shown impressive conversational abilities and excel in a wide variety of NLP tasks.
We propose a light-weight Arabic Mini-ClimateGPT that is built on an open-source LLM and is specifically fine-tuned on a conversational-style instruction tuning Arabic dataset Clima500-Instruct.
Our model surpasses the baseline LLM in 88.3% of cases during ChatGPT-based evaluation.
arXiv Detail & Related papers (2023-12-14T22:04:07Z) - Towards Causal Representations of Climate Model Data [18.82507552857727]
This work delves into the potential of causal representation learning, specifically the emphCausal Discovery with Single-parent Decoding (CDSD) method.
Our findings shed light on the challenges, limitations, and promise of using CDSD as a stepping stone towards more interpretable and robust climate model emulation.
arXiv Detail & Related papers (2023-12-05T16:13:34Z) - ClimateLearn: Benchmarking Machine Learning for Weather and Climate
Modeling [20.63843548201849]
ClimateLearn is an open-source library that vastly simplifies the training and evaluation of machine learning models for data-driven climate science.
It is the first large-scale, open-source effort for bridging research in weather and climate modeling with modern machine learning systems.
arXiv Detail & Related papers (2023-07-04T20:36:01Z) - ClimaX: A foundation model for weather and climate [51.208269971019504]
ClimaX is a deep learning model for weather and climate science.
It can be pre-trained with a self-supervised learning objective on climate datasets.
It can be fine-tuned to address a breadth of climate and weather tasks.
arXiv Detail & Related papers (2023-01-24T23:19:01Z) - Spatiotemporal modeling of European paleoclimate using doubly sparse
Gaussian processes [61.31361524229248]
We build on recent scale sparsetemporal GPs to reduce the computational burden.
We successfully employ such a doubly sparse GP to construct a probabilistic model of paleoclimate.
arXiv Detail & Related papers (2022-11-15T14:15:04Z) - Climate-Invariant Machine Learning [0.8831201550856289]
Current climate models require representations of processes that occur at scales smaller than model grid size.
Recent machine learning (ML) algorithms hold promise to improve such process representations, but tend to extrapolate poorly to climate regimes they were not trained on.
We propose a new framework - termed "climate-invariant" ML - incorporating knowledge of climate processes into ML algorithms.
arXiv Detail & Related papers (2021-12-14T07:02:57Z) - DeepClimGAN: A High-Resolution Climate Data Generator [60.59639064716545]
Earth system models (ESMs) are often used to generate future projections of climate change scenarios.
As a compromise, emulators are substantially less expensive but may not have all of the complexity of an ESM.
Here we demonstrate the use of a conditional generative adversarial network (GAN) to act as an ESM emulator.
arXiv Detail & Related papers (2020-11-23T20:13:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.