Intelligent colocation of HPC workloads
- URL: http://arxiv.org/abs/2103.09019v1
- Date: Tue, 16 Mar 2021 12:35:35 GMT
- Title: Intelligent colocation of HPC workloads
- Authors: Felippe V. Zacarias (1, 2 and 3), Vinicius Petrucci (1 and 5), Rajiv
Nishtala (4), Paul Carpenter (3) and Daniel Moss\'e (5) ((1) Universidade
Federal da Bahia, (2) Universitat Polit\`ecnica de Catalunya, (3) Barcelona
Supercomputing Center, (4) Coop, Norway/Norwegian University of Science and
Technology, Norway, (5) University of Pittsburgh)
- Abstract summary: Many HPC applications suffer from a bottleneck in the shared caches, instruction execution units, I/O or memory bandwidth, even though the remaining resources may be underutilized.
It is hard for developers and runtime systems to ensure that all critical resources are fully exploited by a single application, so an attractive technique is to colocate multiple applications on the same server.
We show that server efficiency can be improved by first modeling the expected performance degradation of colocated applications based on measured hardware performance counters.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Many HPC applications suffer from a bottleneck in the shared caches,
instruction execution units, I/O or memory bandwidth, even though the remaining
resources may be underutilized. It is hard for developers and runtime systems
to ensure that all critical resources are fully exploited by a single
application, so an attractive technique for increasing HPC system utilization
is to colocate multiple applications on the same server. When applications
share critical resources, however, contention on shared resources may lead to
reduced application performance.
In this paper, we show that server efficiency can be improved by first
modeling the expected performance degradation of colocated applications based
on measured hardware performance counters, and then exploiting the model to
determine an optimized mix of colocated applications. This paper presents a new
intelligent resource manager and makes the following contributions: (1) a new
machine learning model to predict the performance degradation of colocated
applications based on hardware counters and (2) an intelligent scheduling
scheme deployed on an existing resource manager to enable application
co-scheduling with minimum performance degradation. Our results show that our
approach achieves performance improvements of 7% (avg) and 12% (max) compared
to the standard policy commonly used by existing job managers.
Related papers
- FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency.
We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs)
We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z) - FusionAI: Decentralized Training and Deploying LLMs with Massive
Consumer-Level GPUs [57.12856172329322]
We envision a decentralized system unlocking the potential vast untapped consumer-level GPU.
This system faces critical challenges, including limited CPU and GPU memory, low network bandwidth, the variability of peer and device heterogeneity.
arXiv Detail & Related papers (2023-09-03T13:27:56Z) - A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical
Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs)
MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z) - A Reinforcement Learning Approach for Performance-aware Reduction in
Power Consumption of Data Center Compute Nodes [0.46040036610482665]
We use Reinforcement Learning to design a power capping policy on cloud compute nodes.
We show how a trained agent running on actual hardware can take actions by balancing power consumption and application performance.
arXiv Detail & Related papers (2023-08-15T23:25:52Z) - PBScaler: A Bottleneck-aware Autoscaling Framework for
Microservice-based Applications [6.453782169615384]
We propose PBScaler, a bottleneck-aware autoscaling framework for microservice-based applications.
We show that PBScaler outperforms existing approaches while conserving resources efficiently.
arXiv Detail & Related papers (2023-03-26T04:20:17Z) - Dynamic Resource Allocation for Metaverse Applications with Deep
Reinforcement Learning [64.75603723249837]
This work proposes a novel framework to dynamically manage and allocate different types of resources for Metaverse applications.
We first propose an effective solution to divide applications into groups, namely MetaInstances, where common functions can be shared among applications.
Then, to capture the real-time, dynamic, and uncertain characteristics of request arrival and application departure processes, we develop a semi-Markov decision process-based framework.
arXiv Detail & Related papers (2023-02-27T00:30:01Z) - Heterogeneous Data-Centric Architectures for Modern Data-Intensive
Applications: Case Studies in Machine Learning and Databases [9.927754948343326]
processing-in-memory (PIM) is a promising execution paradigm that alleviates the data movement bottleneck in modern applications.
In this paper, we show how to take advantage of the PIM paradigm for two modern data-intensive applications.
arXiv Detail & Related papers (2022-05-29T13:43:17Z) - U-Boost NAS: Utilization-Boosted Differentiable Neural Architecture
Search [50.33956216274694]
optimizing resource utilization in target platforms is key to achieving high performance during DNN inference.
We propose a novel hardware-aware NAS framework that does not only optimize for task accuracy and inference latency, but also for resource utilization.
We achieve 2.8 - 4x speedup for DNN inference compared to prior hardware-aware NAS methods.
arXiv Detail & Related papers (2022-03-23T13:44:15Z) - Optimising Resource Management for Embedded Machine Learning [23.00896228073755]
Machine learning inference is increasingly being executed locally on mobile and embedded platforms.
We show approaches for online resource management in heterogeneous multi-core systems.
arXiv Detail & Related papers (2021-05-08T06:10:05Z) - Optimizing Deep Learning Recommender Systems' Training On CPU Cluster
Architectures [56.69373580921888]
We focus on Recommender Systems which account for most of the AI cycles in cloud computing centers.
By enabling it to run on latest CPU hardware and software tailored for HPC, we are able to achieve more than two-orders of magnitude improvement in performance.
arXiv Detail & Related papers (2020-05-10T14:40:16Z) - The Case for Learning Application Behavior to Improve Hardware Energy
Efficiency [2.4425948078034847]
We propose to use the harvested knowledge to tune hardware configurations.
Our proposed approach, called FORECASTER, uses a deep learning model to learn what configuration of hardware resources provides the optimal energy efficiency for a certain behavior of an application.
Our results show that FORECASTER can save as much as 18.4% system power over the baseline set up with all resources.
arXiv Detail & Related papers (2020-04-27T18:11:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.