Related papers: Using Machine Learning at Scale in HPC Simulations with SmartSim: An Application to Ocean Climate Modeling

Using Machine Learning at Scale in HPC Simulations with SmartSim: An Application to Ocean Climate Modeling

URL: http://arxiv.org/abs/2104.09355v1
Date: Tue, 13 Apr 2021 19:27:28 GMT
Title: Using Machine Learning at Scale in HPC Simulations with SmartSim: An Application to Ocean Climate Modeling
Authors: Sam Partee, Matthew Ellis, Alessandro Rigazzi, Scott Bachman, Gustavo Marques, Andrew Shao, Benjamin Robbins
Abstract summary: We demonstrate the first climate-scale, numerical ocean simulations improved through distributed, online inference of Deep Neural Networks (DNN) using SmartSim. SmartSim is a library dedicated to enabling online analysis and Machine Learning (ML) for traditional HPC simulations.
Score: 52.77024349608834
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We demonstrate the first climate-scale, numerical ocean simulations improved through distributed, online inference of Deep Neural Networks (DNN) using SmartSim. SmartSim is a library dedicated to enabling online analysis and Machine Learning (ML) for traditional HPC simulations. In this paper, we detail the SmartSim architecture and provide benchmarks including online inference with a shared ML model on heterogeneous HPC systems. We demonstrate the capability of SmartSim by using it to run a 12-member ensemble of global-scale, high-resolution ocean simulations, each spanning 19 compute nodes, all communicating with the same ML architecture at each simulation timestep. In total, 970 billion inferences are collectively served by running the ensemble for a total of 120 simulated years. Finally, we show our solution is stable over the full duration of the model integrations, and that the inclusion of machine learning has minimal impact on the simulation runtimes.

Related papers

How many simulations do we need for simulation-based inference in cosmology? [0.0]
We show that currently available simulation suites, such as the Quijote Latin Hypercube(LH) with 2000 simulations, do not provide sufficient training data for a generic neural network to reach the optimal regime. We create the largest publicly released simulation data set in cosmology, the Big Sobol Sequence(BSQ), consisting of 32,768 $Lambda$CDM n-body simulations uniformly covering the $Lambda$CDM parameter space.
arXiv Detail & Related papers (2025-03-17T22:21:39Z)
GausSim: Foreseeing Reality by Gaussian Simulator for Elastic Objects [55.02281855589641]
GausSim is a novel neural network-based simulator designed to capture the dynamic behaviors of real-world elastic objects represented through Gaussian kernels. We leverage continuum mechanics and treat each kernel as a Center of Mass System (CMS) that represents continuous piece of matter. In addition, GausSim incorporates explicit physics constraints, such as mass and momentum conservation, ensuring interpretable results and robust, physically plausible simulations.
arXiv Detail & Related papers (2024-12-23T18:58:17Z)
Graph Convolutional Neural Networks as Surrogate Models for Climate Simulation [0.1884913108327873]
We leverage fully-connected neural networks (FCNNs) and graph convolutional neural networks (GCNNs) to enable rapid simulation and uncertainty quantification. Our surrogate simulated 80 years in approximately 310 seconds on a single A100 GPU, compared to weeks for the ESM model.
arXiv Detail & Related papers (2024-09-19T14:41:15Z)
Bridging the Sim-to-Real Gap with Bayesian Inference [53.61496586090384]
We present SIM-FSVGD for learning robot dynamics from data. We use low-fidelity physical priors to regularize the training of neural network models. We demonstrate the effectiveness of SIM-FSVGD in bridging the sim-to-real gap on a high-performance RC racecar system.
arXiv Detail & Related papers (2024-03-25T11:29:32Z)
Waymax: An Accelerated, Data-Driven Simulator for Large-Scale Autonomous Driving Research [76.93956925360638]
Waymax is a new data-driven simulator for autonomous driving in multi-agent scenes. It runs entirely on hardware accelerators such as TPUs/GPUs and supports in-graph simulation for training. We benchmark a suite of popular imitation and reinforcement learning algorithms with ablation studies on different design decisions.
arXiv Detail & Related papers (2023-10-12T20:49:15Z)
In Situ Framework for Coupling Simulation and Machine Learning with Application to CFD [51.04126395480625]
Recent years have seen many successful applications of machine learning (ML) to facilitate fluid dynamic computations. As simulations grow, generating new training datasets for traditional offline learning creates I/O and storage bottlenecks. This work offers a solution by simplifying this coupling and enabling in situ training and inference on heterogeneous clusters.
arXiv Detail & Related papers (2023-06-22T14:07:54Z)
ClimSim-Online: A Large Multi-scale Dataset and Framework for Hybrid ML-physics Climate Emulation [45.201929285600606]
We present ClimSim-Online, which includes an end-to-end workflow for developing hybrid ML-physics simulators. The dataset is global and spans ten years at a high sampling frequency. We provide a cross-platform, containerized pipeline to integrate ML models into operational climate simulators.
arXiv Detail & Related papers (2023-06-14T21:26:31Z)
BayesSimIG: Scalable Parameter Inference for Adaptive Domain Randomization with IsaacGym [59.53949960353792]
BayesSimIG is a library that provides an implementation of BayesSim integrated with the recently released NVIDIA IsaacGym. BayesSimIG provides an integration with NVIDIABoard to easily visualize slices of high-dimensional posteriors.
arXiv Detail & Related papers (2021-07-09T16:21:31Z)
NVIDIA SimNet^{TM}: an AI-accelerated multi-physics simulation framework [5.509715131727269]
We present SimNet, an AI-driven multi-physics simulation framework, to accelerate simulations across a wide range of disciplines. SimNet addresses a wide range of use cases - coupled forward simulations without any training data, inverse and data assimilation problems.
arXiv Detail & Related papers (2020-12-14T20:55:48Z)
Smaller World Models for Reinforcement Learning [0.5156484100374059]
We propose a new neural network architecture for world models based on a vector quantized-variational autoencoder (VQ-VAE) A model-free PPO agent is trained purely on simulated experience from the world model. We show that we reach comparable performance to their SimPLe algorithm, while our model is significantly smaller.
arXiv Detail & Related papers (2020-10-12T15:02:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.