Related papers: New approach to MPI program execution time prediction

New approach to MPI program execution time prediction

URL: http://arxiv.org/abs/2007.15338v1
Date: Thu, 30 Jul 2020 09:35:08 GMT
Title: New approach to MPI program execution time prediction
Authors: A. Chupakhin, A. Kolosov, R. Smeliansky, V. Antonenko, G. Ishelev
Abstract summary: The problem of MPI programs execution time prediction on a certain set of computer installations is considered. This problem emerges with orchestration and provisioning a virtual infrastructure in a cloud computing environment. The article proposes two new approaches to the program execution time prediction problem.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The problem of MPI programs execution time prediction on a certain set of computer installations is considered. This problem emerges with orchestration and provisioning a virtual infrastructure in a cloud computing environment over a heterogeneous network of computer installations: supercomputers or clusters of servers (e.g. mini data centers). One of the key criteria for the effectiveness of the cloud computing environment is the time staying by the program inside the environment. This time consists of the waiting time in the queue and the execution time on the selected physical computer installation, to which the computational resource of the virtual infrastructure is dynamically mapped. One of the components of this problem is the estimation of the MPI programs execution time on a certain set of computer installations. This is necessary to determine a proper choice of order and place for program execution. The article proposes two new approaches to the program execution time prediction problem. The first one is based on computer installations grouping based on the Pearson correlation coefficient. The second one is based on vector representations of computer installations and MPI programs, so-called embeddings. The embedding technique is actively used in recommendation systems, such as for goods (Amazon), for articles (Arxiv.org), for videos (YouTube, Netflix). The article shows how the embeddings technique helps to predict the execution time of a MPI program on a certain set of computer installations.

Related papers

DynaSplit: A Hardware-Software Co-Design Framework for Energy-Aware Inference on Edge [40.96858640950632]
We propose DynaSplit, a framework that dynamically configures parameters across both software and hardware. We evaluate DynaSplit using popular pre-trained NNs on a real-world testbed. Results show a reduction in energy consumption up to 72% compared to cloud-only computation.
arXiv Detail & Related papers (2024-10-31T12:44:07Z)
Toward Smart Scheduling in Tapis [1.0377683220196874]
We present our efforts to develop an intelligent job scheduling capability in Tapis. We focus on one such specific challenge: predicting queue times for a job on different HPC systems and queues. Our first set of results cast the problem as a regression, which can be used to select the best system from a list of existing options.
arXiv Detail & Related papers (2024-08-05T20:01:31Z)
Randomized Polar Codes for Anytime Distributed Machine Learning [66.46612460837147]
We present a novel distributed computing framework that is robust to slow compute nodes, and is capable of both approximate and exact computation of linear operations. We propose a sequential decoding algorithm designed to handle real valued data while maintaining low computational complexity for recovery. We demonstrate the potential applications of this framework in various contexts, such as large-scale matrix multiplication and black-box optimization.
arXiv Detail & Related papers (2023-09-01T18:02:04Z)
XEngine: Optimal Tensor Rematerialization for Neural Networks in Heterogeneous Environments [3.769144330511514]
We present XEngine, an approach that schedules network operators to heterogeneous devices in low memory environments. Our solver finds solutions up to 22.5 % faster than the fastest Checkmate schedule in which the network is computed exclusively on a single device.
arXiv Detail & Related papers (2022-12-19T08:12:25Z)
PARTIME: Scalable and Parallel Processing Over Time with Deep Neural Networks [68.96484488899901]
We present PARTIME, a library designed to speed up neural networks whenever data is continuously streamed over time. PARTIME starts processing each data sample at the time in which it becomes available from the stream. Experiments are performed in order to empirically compare PARTIME with classic non-parallel neural computations in online learning.
arXiv Detail & Related papers (2022-10-17T14:49:14Z)
QParallel: Explicit Parallelism for Programming Quantum Computers [62.10004571940546]
We present a language extension for parallel quantum programming. QParallel removes ambiguities concerning parallelism in current quantum programming languages. We introduce a tool that guides programmers in the placement of parallel regions by identifying the subroutines that profit most from parallelization.
arXiv Detail & Related papers (2022-10-07T16:35:16Z)
Dynamic Split Computing for Efficient Deep Edge Intelligence [78.4233915447056]
We introduce dynamic split computing, where the optimal split location is dynamically selected based on the state of the communication channel. We show that dynamic split computing achieves faster inference in edge computing environments where the data rate and server load vary over time.
arXiv Detail & Related papers (2022-05-23T12:35:18Z)
Towards Optimal VPU Compiler Cost Modeling by using Neural Networks to Infer Hardware Performances [58.720142291102135]
'VPUNN' is a neural network-based cost model trained on low-level task profiling. It consistently outperforms the state-of-the-art cost modeling in Intel's line of VPU processors.
arXiv Detail & Related papers (2022-05-09T22:48:39Z)
Scheduling in Parallel Finite Buffer Systems: Optimal Decisions under Delayed Feedback [29.177402567437206]
We present a partially observable (PO) model that captures the scheduling decisions in parallel queuing systems under limited information of delayed acknowledgements. We numerically show that the resulting policy outperforms other limited information scheduling strategies. We show how our approach can optimise the real-time parallel processing by using network data provided by Kaggle.
arXiv Detail & Related papers (2021-09-17T13:45:02Z)
Better than the Best: Gradient-based Improper Reinforcement Learning for Network Scheduling [60.48359567964899]
We consider the problem of scheduling in constrained queueing networks with a view to minimizing packet delay. We use a policy gradient based reinforcement learning algorithm that produces a scheduler that performs better than the available atomic policies.
arXiv Detail & Related papers (2021-05-01T10:18:34Z)
Rosella: A Self-Driving Distributed Scheduler for Heterogeneous Clusters [7.206919625027208]
We present Rosella, a new self-driving, distributed approach for task scheduling in heterogeneous clusters. Rosella automatically learns the compute environment and adjusts its scheduling policy in real-time. We evaluate Rosella with a variety of workloads on a 32-node AWS cluster.
arXiv Detail & Related papers (2020-10-28T20:12:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.