Related papers: Colmena: Scalable Machine-Learning-Based Steering of Ensemble Simulations for High Performance Computing

Colmena: Scalable Machine-Learning-Based Steering of Ensemble Simulations for High Performance Computing

URL: http://arxiv.org/abs/2110.02827v1
Date: Wed, 6 Oct 2021 14:56:53 GMT
Title: Colmena: Scalable Machine-Learning-Based Steering of Ensemble Simulations for High Performance Computing
Authors: Logan Ward, Ganesh Sivaraman, J. Gregory Pauloski, Yadu Babuji, Ryan Chard, Naveen Dandu, Paul C. Redfern, Rajeev S. Assary, Kyle Chard, Larry A. Curtiss, Rajeev Thakur, Ian Foster
Abstract summary: We present Colmena, an open-source Python framework that allows users to steer campaigns by providing just the implementations of individual tasks. Colmena handles task dispatch, results collation, ML model invocation, and ML model (re)training, using Parsl to execute tasks on HPC systems. We describe the design of Colmena and illustrate its capabilities by applying it to electrolyte design, where it both scales to 65536 CPUs and accelerates the discovery rate for high-performance molecules by a factor of 100 over unguided searches.
Score: 3.5604179670745237
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Scientific applications that involve simulation ensembles can be accelerated greatly by using experiment design methods to select the best simulations to perform. Methods that use machine learning (ML) to create proxy models of simulations show particular promise for guiding ensembles but are challenging to deploy because of the need to coordinate dynamic mixes of simulation and learning tasks. We present Colmena, an open-source Python framework that allows users to steer campaigns by providing just the implementations of individual tasks plus the logic used to choose which tasks to execute when. Colmena handles task dispatch, results collation, ML model invocation, and ML model (re)training, using Parsl to execute tasks on HPC systems. We describe the design of Colmena and illustrate its capabilities by applying it to electrolyte design, where it both scales to 65536 CPUs and accelerates the discovery rate for high-performance molecules by a factor of 100 over unguided searches.

Related papers

AutoHete: An Automatic and Efficient Heterogeneous Training System for LLMs [68.99086112477565]
Transformer-based large language models (LLMs) have demonstrated exceptional capabilities in sequence modeling and text generation. Existing heterogeneous training methods significantly expand the scale of trainable models but introduce substantial communication overheads and CPU workloads. We propose AutoHete, an automatic and efficient heterogeneous training system compatible with both single- GPU and multi- GPU environments.
arXiv Detail & Related papers (2025-02-27T14:46:22Z)
Simulation Streams: A Programming Paradigm for Controlling Large Language Models and Building Complex Systems with Generative AI [3.3126968968429407]
Simulation Streams is a programming paradigm designed to efficiently control and leverage Large Language Models (LLMs) Our primary goal is to create a framework that harnesses the agentic abilities of LLMs while addressing their limitations in maintaining consistency.
arXiv Detail & Related papers (2025-01-30T16:38:03Z)
LLM Agent for Fire Dynamics Simulations [3.0031348283981987]
FoamPilot is a proof-of-concept agent designed to enhance the usability of FireFOAM. FireFOAM is a solver for fire dynamics and fire suppression simulations built using OpenFOAM. FoamPilot provides three core functionalities: code insight, case configuration and simulation evaluation.
arXiv Detail & Related papers (2024-12-22T20:03:35Z)
DrEureka: Language Model Guided Sim-To-Real Transfer [64.14314476811806]
Transferring policies learned in simulation to the real world is a promising strategy for acquiring robot skills at scale. In this paper, we investigate using Large Language Models (LLMs) to automate and accelerate sim-to-real design. Our approach is capable of solving novel robot tasks, such as quadruped balancing and walking atop a yoga ball.
arXiv Detail & Related papers (2024-06-04T04:53:05Z)
Code Simulation Challenges for Large Language Models [6.970495767499435]
This work studies to what extent Large Language Models (LLMs) can simulate coding and algorithmic tasks. We introduce benchmarks for straight-line programs, code that contains critical paths, and approximate and redundant instructions. We propose a novel off-the-shelf prompting method, Chain of Simulation (CoSm), which instructs LLMs to simulate code execution line by line/follow the pattern of compilers.
arXiv Detail & Related papers (2024-01-17T09:23:59Z)
MLatom 3: Platform for machine learning-enhanced computational chemistry simulations and workflows [12.337972297411003]
Machine learning (ML) is increasingly becoming a common tool in computational chemistry. MLatom 3 is a program package designed to leverage the power of ML to enhance typical computational chemistry simulations. The users can choose from an extensive library of methods containing pre-trained ML models and quantum mechanical approximations.
arXiv Detail & Related papers (2023-10-31T03:41:39Z)
Waymax: An Accelerated, Data-Driven Simulator for Large-Scale Autonomous Driving Research [76.93956925360638]
Waymax is a new data-driven simulator for autonomous driving in multi-agent scenes. It runs entirely on hardware accelerators such as TPUs/GPUs and supports in-graph simulation for training. We benchmark a suite of popular imitation and reinforcement learning algorithms with ablation studies on different design decisions.
arXiv Detail & Related papers (2023-10-12T20:49:15Z)
In Situ Framework for Coupling Simulation and Machine Learning with Application to CFD [51.04126395480625]
Recent years have seen many successful applications of machine learning (ML) to facilitate fluid dynamic computations. As simulations grow, generating new training datasets for traditional offline learning creates I/O and storage bottlenecks. This work offers a solution by simplifying this coupling and enabling in situ training and inference on heterogeneous clusters.
arXiv Detail & Related papers (2023-06-22T14:07:54Z)
Multitask Adaptation by Retrospective Exploration with Learned World Models [77.34726150561087]
We propose a meta-learned addressing model called RAMa that provides training samples for the MBRL agent taken from task-agnostic storage. The model is trained to maximize the expected agent's performance by selecting promising trajectories solving prior tasks from the storage.
arXiv Detail & Related papers (2021-10-25T20:02:57Z)
Learning Multi-Objective Curricula for Deep Reinforcement Learning [55.27879754113767]
Various automatic curriculum learning (ACL) methods have been proposed to improve the sample efficiency and final performance of deep reinforcement learning (DRL) In this paper, we propose a unified automatic curriculum learning framework to create multi-objective but coherent curricula. In addition to existing hand-designed curricula paradigms, we further design a flexible memory mechanism to learn an abstract curriculum.
arXiv Detail & Related papers (2021-10-06T19:30:25Z)
A Step Towards Efficient Evaluation of Complex Perception Tasks in Simulation [5.4954641673299145]
We propose an approach that enables efficient large-scale testing using simplified low-fidelity simulators. Our approach relies on designing an efficient surrogate model corresponding to the compute intensive components of the task under test. We demonstrate the efficacy of our methodology by evaluating the performance of an autonomous driving task in the Carla simulator with reduced computational expense.
arXiv Detail & Related papers (2021-09-28T13:50:21Z)
Achieving 100X faster simulations of complex biological phenomena by coupling ML to HPC ensembles [47.44377051031385]
We present DeepDriveMD, a tool for a range of prototypical ML-driven HPC simulation scenarios. We use it to quantify improvements in the scientific performance of ML-driven ensemble-based applications.
arXiv Detail & Related papers (2021-04-10T15:52:39Z)
Integrating Machine Learning with HPC-driven Simulations for Enhanced Student Learning [0.0]
We develop a web application that supports both HPC-driven simulation and the ML surrogate methods to produce simulation outputs. The evaluation of the tool via in-classroom student feedback and surveys shows that the ML-enhanced tool provides a dynamic and responsive simulation environment.
arXiv Detail & Related papers (2020-08-24T22:48:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.