ClimART: A Benchmark Dataset for Emulating Atmospheric Radiative
Transfer in Weather and Climate Models
- URL: http://arxiv.org/abs/2111.14671v1
- Date: Mon, 29 Nov 2021 16:32:31 GMT
- Title: ClimART: A Benchmark Dataset for Emulating Atmospheric Radiative
Transfer in Weather and Climate Models
- Authors: Salva R\"uhling Cachay, Venkatesh Ramesh, Jason N. S. Cole, Howard
Barker, David Rolnick
- Abstract summary: We build a large dataset, ClimART, with more than emph10 million samples from present, pre-industrial, and future climate conditions, based on the Canadian Earth System Model.
ClimART poses several methodological challenges for the ML community, such as multiple out-of-distribution test sets, underlying domain physics, and a trade-off between accuracy and inference speed.
We also present several novel baselines that indicate shortcomings of datasets and network architectures used in prior work.
- Score: 13.514499533538789
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Numerical simulations of Earth's weather and climate require substantial
amounts of computation. This has led to a growing interest in replacing
subroutines that explicitly compute physical processes with approximate machine
learning (ML) methods that are fast at inference time. Within weather and
climate models, atmospheric radiative transfer (RT) calculations are especially
expensive. This has made them a popular target for neural network-based
emulators. However, prior work is hard to compare due to the lack of a
comprehensive dataset and standardized best practices for ML benchmarking. To
fill this gap, we build a large dataset, ClimART, with more than \emph{10
million samples from present, pre-industrial, and future climate conditions},
based on the Canadian Earth System Model. ClimART poses several methodological
challenges for the ML community, such as multiple out-of-distribution test
sets, underlying domain physics, and a trade-off between accuracy and inference
speed. We also present several novel baselines that indicate shortcomings of
datasets and network architectures used in prior work. Download instructions,
baselines, and code are available at: https://github.com/RolnickLab/climart
Related papers
- The impact of internal variability on benchmarking deep learning climate emulators [2.3342885570554652]
Full-complexity Earth system models (ESMs) are computationally very expensive, limiting their use in exploring the climate outcomes of multiple emission pathways.
More efficient emulators that approximate ESMs can directly map emissions onto climate datasets.
We investigate a popular benchmark in data-driven climate emulation, ClimateBench, on which deep learning-based emulators are currently achieving the best performance.
arXiv Detail & Related papers (2024-08-09T18:17:17Z) - NAVSIM: Data-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking [65.24988062003096]
We present NAVSIM, a framework for benchmarking vision-based driving policies.
Our simulation is non-reactive, i.e., the evaluated policy and environment do not influence each other.
NAVSIM enabled a new competition held at CVPR 2024, where 143 teams submitted 463 entries, resulting in several new insights.
arXiv Detail & Related papers (2024-06-21T17:59:02Z) - EM-GANSim: Real-time and Accurate EM Simulation Using Conditional GANs for 3D Indoor Scenes [55.2480439325792]
We present a novel machine-learning (ML) approach (EM-GANSim) for real-time electromagnetic (EM) propagation.
In practice, it can compute the signal strength in a few milliseconds on any location in 3D indoor environments.
arXiv Detail & Related papers (2024-05-27T17:19:02Z) - In Situ Framework for Coupling Simulation and Machine Learning with
Application to CFD [51.04126395480625]
Recent years have seen many successful applications of machine learning (ML) to facilitate fluid dynamic computations.
As simulations grow, generating new training datasets for traditional offline learning creates I/O and storage bottlenecks.
This work offers a solution by simplifying this coupling and enabling in situ training and inference on heterogeneous clusters.
arXiv Detail & Related papers (2023-06-22T14:07:54Z) - ClimSim-Online: A Large Multi-scale Dataset and Framework for Hybrid ML-physics Climate Emulation [45.201929285600606]
We present ClimSim-Online, which includes an end-to-end workflow for developing hybrid ML-physics simulators.
The dataset is global and spans ten years at a high sampling frequency.
We provide a cross-platform, containerized pipeline to integrate ML models into operational climate simulators.
arXiv Detail & Related papers (2023-06-14T21:26:31Z) - ClimaX: A foundation model for weather and climate [51.208269971019504]
ClimaX is a deep learning model for weather and climate science.
It can be pre-trained with a self-supervised learning objective on climate datasets.
It can be fine-tuned to address a breadth of climate and weather tasks.
arXiv Detail & Related papers (2023-01-24T23:19:01Z) - Semantic Segmentation under Adverse Conditions: A Weather and
Nighttime-aware Synthetic Data-based Approach [6.482184764321084]
Recent semantic segmentation models perform well under standard weather conditions but struggle with adverse weather conditions and nighttime.
We present a novel architecture specifically designed for using synthetic training data for domain adaptation.
We propose a simple yet powerful addition to DeepLabV3+ by using weather and time-of-the-day supervisors trained with multi-task learning.
arXiv Detail & Related papers (2022-10-11T17:14:22Z) - DeepClimGAN: A High-Resolution Climate Data Generator [60.59639064716545]
Earth system models (ESMs) are often used to generate future projections of climate change scenarios.
As a compromise, emulators are substantially less expensive but may not have all of the complexity of an ESM.
Here we demonstrate the use of a conditional generative adversarial network (GAN) to act as an ESM emulator.
arXiv Detail & Related papers (2020-11-23T20:13:37Z) - AutoSimulate: (Quickly) Learning Synthetic Data Generation [70.82315853981838]
We propose an efficient alternative for optimal synthetic data generation based on a novel differentiable approximation of the objective.
We demonstrate that the proposed method finds the optimal data distribution faster (up to $50times$), with significantly reduced training data generation (up to $30times$) and better accuracy ($+8.7%$) on real-world test datasets than previous methods.
arXiv Detail & Related papers (2020-08-16T11:36:11Z) - WeatherBench: A benchmark dataset for data-driven weather forecasting [17.76377510880905]
We present a benchmark dataset for data-driven medium-range weather forecasting.
We provide data derived from the ERA5 archive that has been processed to facilitate the use in machine learning models.
We provide baseline scores from simple linear regression techniques, deep learning models, as well as purely physical forecasting models.
arXiv Detail & Related papers (2020-02-02T19:20:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.