ClimSim-Online: A Large Multi-scale Dataset and Framework for Hybrid ML-physics Climate Emulation
- URL: http://arxiv.org/abs/2306.08754v6
- Date: Mon, 8 Jul 2024 19:33:54 GMT
- Title: ClimSim-Online: A Large Multi-scale Dataset and Framework for Hybrid ML-physics Climate Emulation
- Authors: Sungduk Yu, Zeyuan Hu, Akshay Subramaniam, Walter Hannah, Liran Peng, Jerry Lin, Mohamed Aziz Bhouri, Ritwik Gupta, Björn Lütjens, Justus C. Will, Gunnar Behrens, Julius J. M. Busecke, Nora Loose, Charles I. Stern, Tom Beucler, Bryce Harrop, Helge Heuer, Benjamin R. Hillman, Andrea Jenney, Nana Liu, Alistair White, Tian Zheng, Zhiming Kuang, Fiaz Ahmed, Elizabeth Barnes, Noah D. Brenowitz, Christopher Bretherton, Veronika Eyring, Savannah Ferretti, Nicholas Lutsko, Pierre Gentine, Stephan Mandt, J. David Neelin, Rose Yu, Laure Zanna, Nathan Urban, Janni Yuval, Ryan Abernathey, Pierre Baldi, Wayne Chuang, Yu Huang, Fernando Iglesias-Suarez, Sanket Jantre, Po-Lun Ma, Sara Shamekh, Guang Zhang, Michael Pritchard,
- Abstract summary: We present ClimSim-Online, which includes an end-to-end workflow for developing hybrid ML-physics simulators.
The dataset is global and spans ten years at a high sampling frequency.
We provide a cross-platform, containerized pipeline to integrate ML models into operational climate simulators.
- Score: 45.201929285600606
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern climate projections lack adequate spatial and temporal resolution due to computational constraints, leading to inaccuracies in representing critical processes like thunderstorms that occur on the sub-resolution scale. Hybrid methods combining physics with machine learning (ML) offer faster, higher fidelity climate simulations by outsourcing compute-hungry, high-resolution simulations to ML emulators. However, these hybrid ML-physics simulations require domain-specific data and workflows that have been inaccessible to many ML experts. As an extension of the ClimSim dataset (Yu et al., 2024), we present ClimSim-Online, which also includes an end-to-end workflow for developing hybrid ML-physics simulators. The ClimSim dataset includes 5.7 billion pairs of multivariate input/output vectors, capturing the influence of high-resolution, high-fidelity physics on a host climate simulator's macro-scale state. The dataset is global and spans ten years at a high sampling frequency. We provide a cross-platform, containerized pipeline to integrate ML models into operational climate simulators for hybrid testing. We also implement various ML baselines, alongside a hybrid baseline simulator, to highlight the ML challenges of building stable, skillful emulators. The data (https://huggingface.co/datasets/LEAP/ClimSim_high-res) and code (https://leap-stc.github.io/ClimSim and https://github.com/leap-stc/climsim-online) are publicly released to support the development of hybrid ML-physics and high-fidelity climate simulations.
Related papers
- Waymax: An Accelerated, Data-Driven Simulator for Large-Scale Autonomous
Driving Research [76.93956925360638]
Waymax is a new data-driven simulator for autonomous driving in multi-agent scenes.
It runs entirely on hardware accelerators such as TPUs/GPUs and supports in-graph simulation for training.
We benchmark a suite of popular imitation and reinforcement learning algorithms with ablation studies on different design decisions.
arXiv Detail & Related papers (2023-10-12T20:49:15Z) - Sampling Hybrid Climate Simulation at Scale to Reliably Improve Machine Learning Parameterization [1.9199275795132604]
Machine-learning (ML) parameterizations of subgrid processes may one day replace conventional parameterizations.
Our work examines the coupled behavior of full-physics ML parameterizations using large ensembles of hybrid simulations.
We identify design decisions that yield unambiguous improvements to offline and online performance.
arXiv Detail & Related papers (2023-09-28T05:34:29Z) - In Situ Framework for Coupling Simulation and Machine Learning with
Application to CFD [51.04126395480625]
Recent years have seen many successful applications of machine learning (ML) to facilitate fluid dynamic computations.
As simulations grow, generating new training datasets for traditional offline learning creates I/O and storage bottlenecks.
This work offers a solution by simplifying this coupling and enabling in situ training and inference on heterogeneous clusters.
arXiv Detail & Related papers (2023-06-22T14:07:54Z) - Efficient Climate Simulation via Machine Learning Method [21.894503534237664]
We develop a framework called NeuroClim for hybrid modeling under the real-world scenario.
NeuroClim consists of three parts: (1) Platform, (2) dataset, and (3) Metrics.
arXiv Detail & Related papers (2022-08-15T07:47:38Z) - Learning Large-scale Subsurface Simulations with a Hybrid Graph Network
Simulator [57.57321628587564]
We introduce Hybrid Graph Network Simulator (HGNS) for learning reservoir simulations of 3D subsurface fluid flows.
HGNS consists of a subsurface graph neural network (SGNN) to model the evolution of fluid flows, and a 3D-U-Net to model the evolution of pressure.
Using an industry-standard subsurface flow dataset (SPE-10) with 1.1 million cells, we demonstrate that HGNS is able to reduce the inference time up to 18 times compared to standard subsurface simulators.
arXiv Detail & Related papers (2022-06-15T17:29:57Z) - Multi-fidelity Hierarchical Neural Processes [79.0284780825048]
Multi-fidelity surrogate modeling reduces the computational cost by fusing different simulation outputs.
We propose Multi-fidelity Hierarchical Neural Processes (MF-HNP), a unified neural latent variable model for multi-fidelity surrogate modeling.
We evaluate MF-HNP on epidemiology and climate modeling tasks, achieving competitive performance in terms of accuracy and uncertainty estimation.
arXiv Detail & Related papers (2022-06-10T04:54:13Z) - ClimART: A Benchmark Dataset for Emulating Atmospheric Radiative
Transfer in Weather and Climate Models [13.514499533538789]
We build a large dataset, ClimART, with more than emph10 million samples from present, pre-industrial, and future climate conditions, based on the Canadian Earth System Model.
ClimART poses several methodological challenges for the ML community, such as multiple out-of-distribution test sets, underlying domain physics, and a trade-off between accuracy and inference speed.
We also present several novel baselines that indicate shortcomings of datasets and network architectures used in prior work.
arXiv Detail & Related papers (2021-11-29T16:32:31Z) - Using Machine Learning at Scale in HPC Simulations with SmartSim: An
Application to Ocean Climate Modeling [52.77024349608834]
We demonstrate the first climate-scale, numerical ocean simulations improved through distributed, online inference of Deep Neural Networks (DNN) using SmartSim.
SmartSim is a library dedicated to enabling online analysis and Machine Learning (ML) for traditional HPC simulations.
arXiv Detail & Related papers (2021-04-13T19:27:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.