Exploring Techniques for the Analysis of Spontaneous Asynchronicity in
MPI-Parallel Applications
- URL: http://arxiv.org/abs/2205.13963v1
- Date: Fri, 27 May 2022 13:19:07 GMT
- Title: Exploring Techniques for the Analysis of Spontaneous Asynchronicity in
MPI-Parallel Applications
- Authors: Ayesha Afzal, Georg Hager, Gerhard Wellein, Stefano Markidis
- Abstract summary: We run microbenchmarks and realistic proxy applications with the regular compute-communicate structure on two different supercomputing platforms.
We show how desynchronization patterns can be readily identified from a data set that is much smaller than a full MPI trace.
- Score: 0.8889304968879161
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper studies the utility of using data analytics and machine learning
techniques for identifying, classifying, and characterizing the dynamics of
large-scale parallel (MPI) programs. To this end, we run microbenchmarks and
realistic proxy applications with the regular compute-communicate structure on
two different supercomputing platforms and choose the per-process performance
and MPI time per time step as relevant observables. Using principal component
analysis, clustering techniques, correlation functions, and a new "phase space
plot," we show how desynchronization patterns (or lack thereof) can be readily
identified from a data set that is much smaller than a full MPI trace. Our
methods also lead the way towards a more general classification of parallel
program dynamics.
Related papers
- Accelerating Point Cloud Ground Segmentation: From Mechanical to Solid-State Lidars [6.0753266069240235]
We first benchmark point-based, grid-based, and range image-based ground segmentation algorithms.
Our results indicate that the range image-based method offers superior performance and robustness.
Implementing the proposed algorithm on an FPGA demonstrates significant improvements in processing speed and scalability of resource usage.
arXiv Detail & Related papers (2024-08-19T20:39:21Z) - Interpetable Target-Feature Aggregation for Multi-Task Learning based on Bias-Variance Analysis [53.38518232934096]
Multi-task learning (MTL) is a powerful machine learning paradigm designed to leverage shared knowledge across tasks to improve generalization and performance.
We propose an MTL approach at the intersection between task clustering and feature transformation based on a two-phase iterative aggregation of targets and features.
In both phases, a key aspect is to preserve the interpretability of the reduced targets and features through the aggregation with the mean, which is motivated by applications to Earth science.
arXiv Detail & Related papers (2024-06-12T08:30:16Z) - Physics-informed and Unsupervised Riemannian Domain Adaptation for Machine Learning on Heterogeneous EEG Datasets [53.367212596352324]
We propose an unsupervised approach leveraging EEG signal physics.
We map EEG channels to fixed positions using field, source-free domain adaptation.
Our method demonstrates robust performance in brain-computer interface (BCI) tasks and potential biomarker applications.
arXiv Detail & Related papers (2024-03-07T16:17:33Z) - KPIs-Based Clustering and Visualization of HPC jobs: a Feature Reduction
Approach [0.0]
HPC systems need to be constantly monitored to ensure their stability.
The monitoring systems collect a tremendous amount of data about different parameters or Key Performance Indicators (KPIs), such as resource usage, IO waiting time, etc.
A proper analysis of this data, usually stored as time series, can provide insight in choosing the right management strategies as well as the early detection of issues.
arXiv Detail & Related papers (2023-12-11T17:13:54Z) - Learning to Parallelize with OpenMP by Augmented Heterogeneous AST
Representation [7.750212995537728]
We propose a novel graph-based learning approach called Graph2Par that utilizes a heterogeneous augmented abstract syntax tree (Augmented-AST) representation for code.
We create an OMP_Serial dataset with 18598 parallelizable and 13972 non-parallelizable loops to train the machine learning models.
Our results show that our proposed approach achieves the accuracy of parallelizable code region detection with 85% accuracy and outperforms the state-of-the-art token-based machine learning approach.
arXiv Detail & Related papers (2023-05-09T21:57:15Z) - Automatic Discovery of Composite SPMD Partitioning Strategies in PartIR [1.2507285499419876]
We present an automatic partitioner that identifies efficient combinations for many model architectures and accelerator systems.
Our key findings are that a Monte Carlo Tree Search-based partitioner leveraging partition-specific compiler analysis directly into the search and guided goals matches expert-level strategies for various models.
arXiv Detail & Related papers (2022-10-07T17:46:46Z) - Evaluating natural language processing models with generalization
metrics that do not need access to any training or testing data [66.11139091362078]
We provide the first model selection results on large pretrained Transformers from Huggingface using generalization metrics.
Despite their niche status, we find that metrics derived from the heavy-tail (HT) perspective are particularly useful in NLP tasks.
arXiv Detail & Related papers (2022-02-06T20:07:35Z) - SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines.
This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z) - MLPerfTM HPC: A Holistic Benchmark Suite for Scientific Machine Learning
on HPC Systems [32.621917787044396]
We introduceerf HPC, a benchmark suite of scientific machine learning training applications driven by the MLCommonsTM Association.
We develop a systematic framework for their joint analysis and compare them in terms of data staging, algorithmic convergence, and compute performance.
We conclude by characterizing each benchmark with respect to low-level memory, I/O, and network behavior.
arXiv Detail & Related papers (2021-10-21T20:30:12Z) - Improving Video Instance Segmentation via Temporal Pyramid Routing [61.10753640148878]
Video Instance (VIS) is a new and inherently multi-task problem, which aims to detect, segment and track each instance in a video sequence.
We propose a Temporal Pyramid Routing (TPR) strategy to conditionally align and conduct pixel-level aggregation from a feature pyramid pair of two adjacent frames.
Our approach is a plug-and-play module and can be easily applied to existing instance segmentation methods.
arXiv Detail & Related papers (2021-07-28T03:57:12Z) - A Trainable Optimal Transport Embedding for Feature Aggregation and its
Relationship to Attention [96.77554122595578]
We introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference.
Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost.
arXiv Detail & Related papers (2020-06-22T08:35:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.