Using Machine Learning at Scale in HPC Simulations with SmartSim: An
Application to Ocean Climate Modeling
- URL: http://arxiv.org/abs/2104.09355v1
- Date: Tue, 13 Apr 2021 19:27:28 GMT
- Title: Using Machine Learning at Scale in HPC Simulations with SmartSim: An
Application to Ocean Climate Modeling
- Authors: Sam Partee, Matthew Ellis, Alessandro Rigazzi, Scott Bachman, Gustavo
Marques, Andrew Shao, Benjamin Robbins
- Abstract summary: We demonstrate the first climate-scale, numerical ocean simulations improved through distributed, online inference of Deep Neural Networks (DNN) using SmartSim.
SmartSim is a library dedicated to enabling online analysis and Machine Learning (ML) for traditional HPC simulations.
- Score: 52.77024349608834
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We demonstrate the first climate-scale, numerical ocean simulations improved
through distributed, online inference of Deep Neural Networks (DNN) using
SmartSim. SmartSim is a library dedicated to enabling online analysis and
Machine Learning (ML) for traditional HPC simulations. In this paper, we detail
the SmartSim architecture and provide benchmarks including online inference
with a shared ML model on heterogeneous HPC systems. We demonstrate the
capability of SmartSim by using it to run a 12-member ensemble of global-scale,
high-resolution ocean simulations, each spanning 19 compute nodes, all
communicating with the same ML architecture at each simulation timestep. In
total, 970 billion inferences are collectively served by running the ensemble
for a total of 120 simulated years. Finally, we show our solution is stable
over the full duration of the model integrations, and that the inclusion of
machine learning has minimal impact on the simulation runtimes.
Related papers
- Graph Convolutional Neural Networks as Surrogate Models for Climate Simulation [0.1884913108327873]
We leverage fully-connected neural networks (FCNNs) and graph convolutional neural networks (GCNNs) to enable rapid simulation and uncertainty quantification.
Our surrogate simulated 80 years in approximately 310 seconds on a single A100 GPU, compared to weeks for the ESM model.
arXiv Detail & Related papers (2024-09-19T14:41:15Z) - Bridging the Sim-to-Real Gap with Bayesian Inference [53.61496586090384]
We present SIM-FSVGD for learning robot dynamics from data.
We use low-fidelity physical priors to regularize the training of neural network models.
We demonstrate the effectiveness of SIM-FSVGD in bridging the sim-to-real gap on a high-performance RC racecar system.
arXiv Detail & Related papers (2024-03-25T11:29:32Z) - Waymax: An Accelerated, Data-Driven Simulator for Large-Scale Autonomous
Driving Research [76.93956925360638]
Waymax is a new data-driven simulator for autonomous driving in multi-agent scenes.
It runs entirely on hardware accelerators such as TPUs/GPUs and supports in-graph simulation for training.
We benchmark a suite of popular imitation and reinforcement learning algorithms with ablation studies on different design decisions.
arXiv Detail & Related papers (2023-10-12T20:49:15Z) - In Situ Framework for Coupling Simulation and Machine Learning with
Application to CFD [51.04126395480625]
Recent years have seen many successful applications of machine learning (ML) to facilitate fluid dynamic computations.
As simulations grow, generating new training datasets for traditional offline learning creates I/O and storage bottlenecks.
This work offers a solution by simplifying this coupling and enabling in situ training and inference on heterogeneous clusters.
arXiv Detail & Related papers (2023-06-22T14:07:54Z) - ClimSim-Online: A Large Multi-scale Dataset and Framework for Hybrid ML-physics Climate Emulation [45.201929285600606]
We present ClimSim-Online, which includes an end-to-end workflow for developing hybrid ML-physics simulators.
The dataset is global and spans ten years at a high sampling frequency.
We provide a cross-platform, containerized pipeline to integrate ML models into operational climate simulators.
arXiv Detail & Related papers (2023-06-14T21:26:31Z) - BayesSimIG: Scalable Parameter Inference for Adaptive Domain
Randomization with IsaacGym [59.53949960353792]
BayesSimIG is a library that provides an implementation of BayesSim integrated with the recently released NVIDIA IsaacGym.
BayesSimIG provides an integration with NVIDIABoard to easily visualize slices of high-dimensional posteriors.
arXiv Detail & Related papers (2021-07-09T16:21:31Z) - NVIDIA SimNet^{TM}: an AI-accelerated multi-physics simulation framework [5.509715131727269]
We present SimNet, an AI-driven multi-physics simulation framework, to accelerate simulations across a wide range of disciplines.
SimNet addresses a wide range of use cases - coupled forward simulations without any training data, inverse and data assimilation problems.
arXiv Detail & Related papers (2020-12-14T20:55:48Z) - Smaller World Models for Reinforcement Learning [0.5156484100374059]
We propose a new neural network architecture for world models based on a vector quantized-variational autoencoder (VQ-VAE)
A model-free PPO agent is trained purely on simulated experience from the world model.
We show that we reach comparable performance to their SimPLe algorithm, while our model is significantly smaller.
arXiv Detail & Related papers (2020-10-12T15:02:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.