Related papers: Exploring Techniques for the Analysis of Spontaneous Asynchronicity in MPI-Parallel Applications

Exploring Techniques for the Analysis of Spontaneous Asynchronicity in MPI-Parallel Applications

URL: http://arxiv.org/abs/2205.13963v1
Date: Fri, 27 May 2022 13:19:07 GMT
Title: Exploring Techniques for the Analysis of Spontaneous Asynchronicity in MPI-Parallel Applications
Authors: Ayesha Afzal, Georg Hager, Gerhard Wellein, Stefano Markidis
Abstract summary: We run microbenchmarks and realistic proxy applications with the regular compute-communicate structure on two different supercomputing platforms. We show how desynchronization patterns can be readily identified from a data set that is much smaller than a full MPI trace.
Score: 0.8889304968879161
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper studies the utility of using data analytics and machine learning techniques for identifying, classifying, and characterizing the dynamics of large-scale parallel (MPI) programs. To this end, we run microbenchmarks and realistic proxy applications with the regular compute-communicate structure on two different supercomputing platforms and choose the per-process performance and MPI time per time step as relevant observables. Using principal component analysis, clustering techniques, correlation functions, and a new "phase space plot," we show how desynchronization patterns (or lack thereof) can be readily identified from a data set that is much smaller than a full MPI trace. Our methods also lead the way towards a more general classification of parallel program dynamics.

Related papers

Accurate and Efficient Multivariate Time Series Forecasting via Offline Clustering [22.545533166145706]
We introduce the Forecaster with Offline Clustering Using Segments (FOCUS)<n>FOCUS is a novel approach to MTS forecasting that simplifies long-range dependency modeling.<n>It achieves state-of-the-art accuracy while significantly reducing computational costs.
arXiv Detail & Related papers (2025-05-09T02:34:06Z)
Accelerating Point Cloud Ground Segmentation: From Mechanical to Solid-State Lidars [6.0753266069240235]
We first benchmark point-based, grid-based, and range image-based ground segmentation algorithms. Our results indicate that the range image-based method offers superior performance and robustness. Implementing the proposed algorithm on an FPGA demonstrates significant improvements in processing speed and scalability of resource usage.
arXiv Detail & Related papers (2024-08-19T20:39:21Z)
Interpetable Target-Feature Aggregation for Multi-Task Learning based on Bias-Variance Analysis [53.38518232934096]
Multi-task learning (MTL) is a powerful machine learning paradigm designed to leverage shared knowledge across tasks to improve generalization and performance. We propose an MTL approach at the intersection between task clustering and feature transformation based on a two-phase iterative aggregation of targets and features. In both phases, a key aspect is to preserve the interpretability of the reduced targets and features through the aggregation with the mean, which is motivated by applications to Earth science.
arXiv Detail & Related papers (2024-06-12T08:30:16Z)
Physics-informed and Unsupervised Riemannian Domain Adaptation for Machine Learning on Heterogeneous EEG Datasets [53.367212596352324]
We propose an unsupervised approach leveraging EEG signal physics. We map EEG channels to fixed positions using field, source-free domain adaptation. Our method demonstrates robust performance in brain-computer interface (BCI) tasks and potential biomarker applications.
arXiv Detail & Related papers (2024-03-07T16:17:33Z)
RMP-SAM: Towards Real-Time Multi-Purpose Segment Anything [117.02741621686677]
This work explores a novel real-time segmentation setting called real-time multi-purpose segmentation. It contains three fundamental sub-tasks: interactive segmentation, panoptic segmentation, and video instance segmentation. We present a novel dynamic convolution-based method, Real-Time Multi-Purpose SAM (RMP-SAM) It contains an efficient encoder and an efficient decoupled adapter to perform prompt-driven decoding.
arXiv Detail & Related papers (2024-01-18T18:59:30Z)
KPIs-Based Clustering and Visualization of HPC jobs: a Feature Reduction Approach [0.0]
HPC systems need to be constantly monitored to ensure their stability. The monitoring systems collect a tremendous amount of data about different parameters or Key Performance Indicators (KPIs), such as resource usage, IO waiting time, etc. A proper analysis of this data, usually stored as time series, can provide insight in choosing the right management strategies as well as the early detection of issues.
arXiv Detail & Related papers (2023-12-11T17:13:54Z)
Learning to Parallelize with OpenMP by Augmented Heterogeneous AST Representation [7.750212995537728]
We propose a novel graph-based learning approach called Graph2Par that utilizes a heterogeneous augmented abstract syntax tree (Augmented-AST) representation for code. We create an OMP_Serial dataset with 18598 parallelizable and 13972 non-parallelizable loops to train the machine learning models. Our results show that our proposed approach achieves the accuracy of parallelizable code region detection with 85% accuracy and outperforms the state-of-the-art token-based machine learning approach.
arXiv Detail & Related papers (2023-05-09T21:57:15Z)
Automatic Discovery of Composite SPMD Partitioning Strategies in PartIR [1.2507285499419876]
We present an automatic partitioner that identifies efficient combinations for many model architectures and accelerator systems. Our key findings are that a Monte Carlo Tree Search-based partitioner leveraging partition-specific compiler analysis directly into the search and guided goals matches expert-level strategies for various models.
arXiv Detail & Related papers (2022-10-07T17:46:46Z)
Evaluating natural language processing models with generalization metrics that do not need access to any training or testing data [66.11139091362078]
We provide the first model selection results on large pretrained Transformers from Huggingface using generalization metrics. Despite their niche status, we find that metrics derived from the heavy-tail (HT) perspective are particularly useful in NLP tasks.
arXiv Detail & Related papers (2022-02-06T20:07:35Z)
SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines. This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z)
MLPerfTM HPC: A Holistic Benchmark Suite for Scientific Machine Learning on HPC Systems [32.621917787044396]
We introduceerf HPC, a benchmark suite of scientific machine learning training applications driven by the MLCommonsTM Association. We develop a systematic framework for their joint analysis and compare them in terms of data staging, algorithmic convergence, and compute performance. We conclude by characterizing each benchmark with respect to low-level memory, I/O, and network behavior.
arXiv Detail & Related papers (2021-10-21T20:30:12Z)
Improving Video Instance Segmentation via Temporal Pyramid Routing [61.10753640148878]
Video Instance (VIS) is a new and inherently multi-task problem, which aims to detect, segment and track each instance in a video sequence. We propose a Temporal Pyramid Routing (TPR) strategy to conditionally align and conduct pixel-level aggregation from a feature pyramid pair of two adjacent frames. Our approach is a plug-and-play module and can be easily applied to existing instance segmentation methods.
arXiv Detail & Related papers (2021-07-28T03:57:12Z)
A Trainable Optimal Transport Embedding for Feature Aggregation and its Relationship to Attention [96.77554122595578]
We introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference. Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost.
arXiv Detail & Related papers (2020-06-22T08:35:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.