Asynchronous Execution of Heterogeneous Tasks in ML-driven HPC Workflows
- URL: http://arxiv.org/abs/2208.11069v2
- Date: Tue, 27 Jun 2023 16:13:22 GMT
- Title: Asynchronous Execution of Heterogeneous Tasks in ML-driven HPC Workflows
- Authors: Vincent R. Pascuzzi, Ozgur O. Kilic, Matteo Turilli, Shantenu Jha
- Abstract summary: Asynchronous execution is crucial to improve resource utilization, task throughput and reduce' makespan.
We investigate the requirements and properties of the asynchronous task execution of machine learning (ML)-driven high performance computing.
Our experiments represent relevant scientific drivers, we perform them at scale on Summit, and we show that the performance enhancements due to asynchronous execution are consistent with our model.
- Score: 1.376408511310322
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Heterogeneous scientific workflows consist of numerous types of tasks that
require executing on heterogeneous resources. Asynchronous execution of those
tasks is crucial to improve resource utilization, task throughput and reduce
workflows' makespan. Therefore, middleware capable of scheduling and executing
different task types across heterogeneous resources must enable asynchronous
execution of tasks. In this paper, we investigate the requirements and
properties of the asynchronous task execution of machine learning (ML)-driven
high performance computing (HPC) workflows. We model the degree of
asynchronicity permitted for arbitrary workflows and propose key metrics that
can be used to determine qualitative benefits when employing asynchronous
execution. Our experiments represent relevant scientific drivers, we perform
them at scale on Summit, and we show that the performance enhancements due to
asynchronous execution are consistent with our model.
Related papers
- Benchmarking Agentic Workflow Generation [80.74757493266057]
We introduce WorFBench, a unified workflow generation benchmark with multi-faceted scenarios and intricate graph workflow structures.
We also present WorFEval, a systemic evaluation protocol utilizing subsequence and subgraph matching algorithms.
We observe that the generated can enhance downstream tasks, enabling them to achieve superior performance with less time during inference.
arXiv Detail & Related papers (2024-10-10T12:41:19Z) - GenAgent: Build Collaborative AI Systems with Automated Workflow Generation -- Case Studies on ComfyUI [64.57616646552869]
This paper explores collaborative AI systems that use to enhance performance to integrate models, data sources, and pipelines to solve complex and diverse tasks.
We introduce GenAgent, an LLM-based framework that automatically generates complex, offering greater flexibility and scalability compared to monolithic models.
The results demonstrate that GenAgent outperforms baseline approaches in both run-level and task-level evaluations.
arXiv Detail & Related papers (2024-09-02T17:44:10Z) - TaskBench: Benchmarking Large Language Models for Task Automation [82.2932794189585]
We introduce TaskBench, a framework to evaluate the capability of large language models (LLMs) in task automation.
Specifically, task decomposition, tool selection, and parameter prediction are assessed.
Our approach combines automated construction with rigorous human verification, ensuring high consistency with human evaluation.
arXiv Detail & Related papers (2023-11-30T18:02:44Z) - Reinforcement Learning with Success Induced Task Prioritization [68.8204255655161]
We introduce Success Induced Task Prioritization (SITP), a framework for automatic curriculum learning.
The algorithm selects the order of tasks that provide the fastest learning for agents.
We demonstrate that SITP matches or surpasses the results of other curriculum design methods.
arXiv Detail & Related papers (2022-12-30T12:32:43Z) - A Makespan and Energy-Aware Scheduling Algorithm for Workflows under
Reliability Constraint on a Multiprocessor Platform [11.427019313284]
We propose a workflow scheduling algorithm to minimize the makespan and energy for a given reliability constraint.
We show that our algorithms, MERT and EAFTS, outperform the state-of-art approaches.
arXiv Detail & Related papers (2022-12-19T07:03:04Z) - Uni-Perceiver: Pre-training Unified Architecture for Generic Perception
for Zero-shot and Few-shot Tasks [73.63892022944198]
We present a generic perception architecture named Uni-Perceiver.
It processes a variety of modalities and tasks with unified modeling and shared parameters.
Results show that our pre-trained model without any tuning can achieve reasonable performance even on novel tasks.
arXiv Detail & Related papers (2021-12-02T18:59:50Z) - Exploring Relational Context for Multi-Task Dense Prediction [76.86090370115]
We consider a multi-task environment for dense prediction tasks, represented by a common backbone and independent task-specific heads.
We explore various attention-based contexts, such as global and local, in the multi-task setting.
We propose an Adaptive Task-Relational Context module, which samples the pool of all available contexts for each task pair.
arXiv Detail & Related papers (2021-04-28T16:45:56Z) - A Fast Edge-Based Synchronizer for Tasks in Real-Time Artificial
Intelligence Applications [0.8122270502556374]
Task synchronization across devices is an important problem that affects the timely progress of an AI application.
We develop a fast edge-based synchronization scheme that can time align the execution of input-output tasks as well compute tasks.
arXiv Detail & Related papers (2020-12-21T23:02:21Z) - Sequence-to-sequence models for workload interference [1.988145627448243]
Co-scheduling of jobs in data-centers is a challenging scenario, where jobs can compete for resources yielding to severe slowdowns or failed executions.
Current techniques, most of them already involving machine learning and job modeling, are based on workload behavior summarization across time.
We propose a methodology for modeling co-scheduling of jobs on data-centers, based on their behavior towards resources and execution time.
arXiv Detail & Related papers (2020-06-25T14:11:46Z) - Taskflow: A Lightweight Parallel and Heterogeneous Task Graph Computing
System [12.813275501138193]
Taskflow aims to streamline the building of parallel and heterogeneous applications using a lightweight task graph-based approach.
Our programming model distinguishes itself as a very general class of task graph parallelism with in-graph control flow.
We have demonstrated the promising performance of Taskflow in real-world applications.
arXiv Detail & Related papers (2020-04-23T00:21:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.