AI-coupled HPC Workflows
- URL: http://arxiv.org/abs/2208.11745v1
- Date: Wed, 24 Aug 2022 19:16:43 GMT
- Title: AI-coupled HPC Workflows
- Authors: Shantenu Jha, Vincent R. Pascuzzi, Matteo Turilli
- Abstract summary: Introduction to AI/ML models into the traditional HPC has been an enabler of highly accurate modeling.
Various modes of integrating AI/ML models to HPC computations, resulting in diverse types of AI-coupled HPC.
- Score: 1.5469452301122175
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Increasingly, scientific discovery requires sophisticated and scalable
workflows. Workflows have become the ``new applications,'' wherein multi-scale
computing campaigns comprise multiple and heterogeneous executable tasks. In
particular, the introduction of AI/ML models into the traditional HPC workflows
has been an enabler of highly accurate modeling, typically reducing
computational needs compared to traditional methods. This chapter discusses
various modes of integrating AI/ML models to HPC computations, resulting in
diverse types of AI-coupled HPC workflows. The increasing need of coupling
AI/ML and HPC across scientific domains is motivated, and then exemplified by a
number of production-grade use cases for each mode. We additionally discuss the
primary challenges of extreme-scale AI-coupled HPC campaigns -- task
heterogeneity, adaptivity, performance -- and several framework and middleware
solutions which aim to address them. While both HPC workflow and AI/ML
computing paradigms are independently effective, we highlight how their
integration, and ultimate convergence, is leading to significant improvements
in scientific performance across a range of domains, ultimately resulting in
scientific explorations otherwise unattainable.
Related papers
- GenAgent: Build Collaborative AI Systems with Automated Workflow Generation -- Case Studies on ComfyUI [64.57616646552869]
This paper explores collaborative AI systems that use to enhance performance to integrate models, data sources, and pipelines to solve complex and diverse tasks.
We introduce GenAgent, an LLM-based framework that automatically generates complex, offering greater flexibility and scalability compared to monolithic models.
The results demonstrate that GenAgent outperforms baseline approaches in both run-level and task-level evaluations.
arXiv Detail & Related papers (2024-09-02T17:44:10Z) - Employing Artificial Intelligence to Steer Exascale Workflows with Colmena [37.42013214123005]
Colmena allows scientists to define how their application should respond to events as a series of cooperative agents.
We describe the challenges we overcame while deploying applications on exascale systems, and the science we have enhanced through AI.
Our vision is that Colmena will spur creative solutions that harness AI across many domains of scientific computing.
arXiv Detail & Related papers (2024-08-26T17:21:19Z) - Using AI libraries for Incompressible Computational Fluid Dynamics [0.7734726150561089]
We present a novel methodology to bring the power of both AI software and hardware into the field of numerical modelling.
We use the proposed methodology to solve the advection-diffusion equation, the non-linear Burgers equation and incompressible flow past a bluff body.
arXiv Detail & Related papers (2024-02-27T22:00:50Z) - Machine Learning Insides OptVerse AI Solver: Design Principles and
Applications [74.67495900436728]
We present a comprehensive study on the integration of machine learning (ML) techniques into Huawei Cloud's OptVerse AI solver.
We showcase our methods for generating complex SAT and MILP instances utilizing generative models that mirror multifaceted structures of real-world problem.
We detail the incorporation of state-of-the-art parameter tuning algorithms which markedly elevate solver performance.
arXiv Detail & Related papers (2024-01-11T15:02:15Z) - HPC-GPT: Integrating Large Language Model for High-Performance Computing [3.8078849170829407]
We propose HPC-GPT, a novel LLaMA-based model that has been supervised fine-tuning using generated QA (Question-Answer) instances for the HPC domain.
To evaluate its effectiveness, we concentrate on two HPC tasks: managing AI models and datasets for HPC, and data race detection.
Our experiments on open-source benchmarks yield extensive results, underscoring HPC-GPT's potential to bridge the performance gap between LLMs and HPC-specific tasks.
arXiv Detail & Related papers (2023-10-03T01:34:55Z) - Meta-Learning for Airflow Simulations with Graph Neural Networks [3.52359746858894]
We present a meta-learning approach to enhance the performance of learned models on out-of-distribution (OoD) samples.
Specifically, we set the airflow simulation in CFD over various airfoils as a meta-learning problem, where each set of examples defined on a single airfoil shape is treated as a separate task.
We experimentally demonstrate the efficiency of the proposed approach for improving the OoD generalization performance of learned models.
arXiv Detail & Related papers (2023-06-18T19:25:13Z) - Partitioning Distributed Compute Jobs with Reinforcement Learning and
Graph Neural Networks [58.720142291102135]
Large-scale machine learning models are bringing advances to a broad range of fields.
Many of these models are too large to be trained on a single machine, and must be distributed across multiple devices.
We show that maximum parallelisation is sub-optimal in relation to user-critical metrics such as throughput and blocking rate.
arXiv Detail & Related papers (2023-01-31T17:41:07Z) - Multi-fidelity Hierarchical Neural Processes [79.0284780825048]
Multi-fidelity surrogate modeling reduces the computational cost by fusing different simulation outputs.
We propose Multi-fidelity Hierarchical Neural Processes (MF-HNP), a unified neural latent variable model for multi-fidelity surrogate modeling.
We evaluate MF-HNP on epidemiology and climate modeling tasks, achieving competitive performance in terms of accuracy and uncertainty estimation.
arXiv Detail & Related papers (2022-06-10T04:54:13Z) - DIME: Fine-grained Interpretations of Multimodal Models via Disentangled
Local Explanations [119.1953397679783]
We focus on advancing the state-of-the-art in interpreting multimodal models.
Our proposed approach, DIME, enables accurate and fine-grained analysis of multimodal models.
arXiv Detail & Related papers (2022-03-03T20:52:47Z) - Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems.
Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC.
We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z) - Integrating Deep Learning in Domain Sciences at Exascale [2.241545093375334]
We evaluate existing packages for their ability to run deep learning models and applications on large-scale HPC systems efficiently.
We propose new asynchronous parallelization and optimization techniques for current large-scale heterogeneous systems.
We present illustrations and potential solutions for enhancing traditional compute- and data-intensive applications with AI.
arXiv Detail & Related papers (2020-11-23T03:09:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.