Colmena: Scalable Machine-Learning-Based Steering of Ensemble
Simulations for High Performance Computing
- URL: http://arxiv.org/abs/2110.02827v1
- Date: Wed, 6 Oct 2021 14:56:53 GMT
- Title: Colmena: Scalable Machine-Learning-Based Steering of Ensemble
Simulations for High Performance Computing
- Authors: Logan Ward, Ganesh Sivaraman, J. Gregory Pauloski, Yadu Babuji, Ryan
Chard, Naveen Dandu, Paul C. Redfern, Rajeev S. Assary, Kyle Chard, Larry A.
Curtiss, Rajeev Thakur, Ian Foster
- Abstract summary: We present Colmena, an open-source Python framework that allows users to steer campaigns by providing just the implementations of individual tasks.
Colmena handles task dispatch, results collation, ML model invocation, and ML model (re)training, using Parsl to execute tasks on HPC systems.
We describe the design of Colmena and illustrate its capabilities by applying it to electrolyte design, where it both scales to 65536 CPUs and accelerates the discovery rate for high-performance molecules by a factor of 100 over unguided searches.
- Score: 3.5604179670745237
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scientific applications that involve simulation ensembles can be accelerated
greatly by using experiment design methods to select the best simulations to
perform. Methods that use machine learning (ML) to create proxy models of
simulations show particular promise for guiding ensembles but are challenging
to deploy because of the need to coordinate dynamic mixes of simulation and
learning tasks. We present Colmena, an open-source Python framework that allows
users to steer campaigns by providing just the implementations of individual
tasks plus the logic used to choose which tasks to execute when. Colmena
handles task dispatch, results collation, ML model invocation, and ML model
(re)training, using Parsl to execute tasks on HPC systems. We describe the
design of Colmena and illustrate its capabilities by applying it to electrolyte
design, where it both scales to 65536 CPUs and accelerates the discovery rate
for high-performance molecules by a factor of 100 over unguided searches.
Related papers
- DrEureka: Language Model Guided Sim-To-Real Transfer [64.14314476811806]
Transferring policies learned in simulation to the real world is a promising strategy for acquiring robot skills at scale.
In this paper, we investigate using Large Language Models (LLMs) to automate and accelerate sim-to-real design.
Our approach is capable of solving novel robot tasks, such as quadruped balancing and walking atop a yoga ball.
arXiv Detail & Related papers (2024-06-04T04:53:05Z) - Code Simulation Challenges for Large Language Models [6.970495767499435]
This work studies to what extent Large Language Models (LLMs) can simulate coding and algorithmic tasks.
We introduce benchmarks for straight-line programs, code that contains critical paths, and approximate and redundant instructions.
We propose a novel off-the-shelf prompting method, Chain of Simulation (CoSm), which instructs LLMs to simulate code execution line by line/follow the pattern of compilers.
arXiv Detail & Related papers (2024-01-17T09:23:59Z) - MLatom 3: Platform for machine learning-enhanced computational chemistry
simulations and workflows [12.337972297411003]
Machine learning (ML) is increasingly becoming a common tool in computational chemistry.
MLatom 3 is a program package designed to leverage the power of ML to enhance typical computational chemistry simulations.
The users can choose from an extensive library of methods containing pre-trained ML models and quantum mechanical approximations.
arXiv Detail & Related papers (2023-10-31T03:41:39Z) - Waymax: An Accelerated, Data-Driven Simulator for Large-Scale Autonomous
Driving Research [76.93956925360638]
Waymax is a new data-driven simulator for autonomous driving in multi-agent scenes.
It runs entirely on hardware accelerators such as TPUs/GPUs and supports in-graph simulation for training.
We benchmark a suite of popular imitation and reinforcement learning algorithms with ablation studies on different design decisions.
arXiv Detail & Related papers (2023-10-12T20:49:15Z) - In Situ Framework for Coupling Simulation and Machine Learning with
Application to CFD [51.04126395480625]
Recent years have seen many successful applications of machine learning (ML) to facilitate fluid dynamic computations.
As simulations grow, generating new training datasets for traditional offline learning creates I/O and storage bottlenecks.
This work offers a solution by simplifying this coupling and enabling in situ training and inference on heterogeneous clusters.
arXiv Detail & Related papers (2023-06-22T14:07:54Z) - Multitask Adaptation by Retrospective Exploration with Learned World
Models [77.34726150561087]
We propose a meta-learned addressing model called RAMa that provides training samples for the MBRL agent taken from task-agnostic storage.
The model is trained to maximize the expected agent's performance by selecting promising trajectories solving prior tasks from the storage.
arXiv Detail & Related papers (2021-10-25T20:02:57Z) - Learning Multi-Objective Curricula for Deep Reinforcement Learning [55.27879754113767]
Various automatic curriculum learning (ACL) methods have been proposed to improve the sample efficiency and final performance of deep reinforcement learning (DRL)
In this paper, we propose a unified automatic curriculum learning framework to create multi-objective but coherent curricula.
In addition to existing hand-designed curricula paradigms, we further design a flexible memory mechanism to learn an abstract curriculum.
arXiv Detail & Related papers (2021-10-06T19:30:25Z) - A Step Towards Efficient Evaluation of Complex Perception Tasks in
Simulation [5.4954641673299145]
We propose an approach that enables efficient large-scale testing using simplified low-fidelity simulators.
Our approach relies on designing an efficient surrogate model corresponding to the compute intensive components of the task under test.
We demonstrate the efficacy of our methodology by evaluating the performance of an autonomous driving task in the Carla simulator with reduced computational expense.
arXiv Detail & Related papers (2021-09-28T13:50:21Z) - Achieving 100X faster simulations of complex biological phenomena by
coupling ML to HPC ensembles [47.44377051031385]
We present DeepDriveMD, a tool for a range of prototypical ML-driven HPC simulation scenarios.
We use it to quantify improvements in the scientific performance of ML-driven ensemble-based applications.
arXiv Detail & Related papers (2021-04-10T15:52:39Z) - Integrating Machine Learning with HPC-driven Simulations for Enhanced
Student Learning [0.0]
We develop a web application that supports both HPC-driven simulation and the ML surrogate methods to produce simulation outputs.
The evaluation of the tool via in-classroom student feedback and surveys shows that the ML-enhanced tool provides a dynamic and responsive simulation environment.
arXiv Detail & Related papers (2020-08-24T22:48:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.