OMB-Py: Python Micro-Benchmarks for Evaluating Performance of MPI
Libraries on HPC Systems
- URL: http://arxiv.org/abs/2110.10659v1
- Date: Wed, 20 Oct 2021 16:59:14 GMT
- Title: OMB-Py: Python Micro-Benchmarks for Evaluating Performance of MPI
Libraries on HPC Systems
- Authors: Nawras Alnaasan, Arpan Jain, Aamir Shafi, Hari Subramoni, and
Dhabaleswar K Panda
- Abstract summary: OMB-Py is the first communication benchmark suite for parallel Python applications.
OMB-Py consists of a variety of point-to-point and collective communication benchmark tests.
We report up to 106x speedup on 224 CPU cores compared to sequential execution.
- Score: 1.066106854070245
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Python has become a dominant programming language for emerging areas like
Machine Learning (ML), Deep Learning (DL), and Data Science (DS). An attractive
feature of Python is that it provides easy-to-use programming interface while
allowing library developers to enhance performance of their applications by
harnessing the computing power offered by High Performance Computing (HPC)
platforms. Efficient communication is key to scaling applications on parallel
systems, which is typically enabled by the Message Passing Interface (MPI)
standard and compliant libraries on HPC hardware. mpi4py is a Python-based
communication library that provides an MPI-like interface for Python
applications allowing application developers to utilize parallel processing
elements including GPUs. However, there is currently no benchmark suite to
evaluate communication performance of mpi4py -- and Python MPI codes in general
-- on modern HPC systems. In order to bridge this gap, we propose OMB-Py --
Python extensions to the open-source OSU Micro-Benchmark (OMB) suite -- aimed
to evaluate communication performance of MPI-based parallel applications in
Python. To the best of our knowledge, OMB-Py is the first communication
benchmark suite for parallel Python applications. OMB-Py consists of a variety
of point-to-point and collective communication benchmark tests that are
implemented for a range of popular Python libraries including NumPy, CuPy,
Numba, and PyCUDA. We also provide Python implementation for several
distributed ML algorithms as benchmarks to understand the potential gain in
performance for ML/DL workloads. Our evaluation reveals that mpi4py introduces
a small overhead when compared to native MPI libraries. We also evaluate the
ML/DL workloads and report up to 106x speedup on 224 CPU cores compared to
sequential execution. We plan to publicly release OMB-Py to benefit Python HPC
community.
Related papers
- CRUXEval-X: A Benchmark for Multilingual Code Reasoning, Understanding and Execution [50.7413285637879]
The CRUXEVAL-X code reasoning benchmark contains 19 programming languages.
It comprises at least 600 subjects for each language, along with 19K content-consistent tests in total.
Even a model trained solely on Python can achieve at most 34.4% Pass@1 in other languages.
arXiv Detail & Related papers (2024-08-23T11:43:00Z) - PyMarian: Fast Neural Machine Translation and Evaluation in Python [11.291502854418098]
We describe a Python interface to Marian NMT, a C++-based training and inference toolkit for sequence-to-sequence models.
This interface enables models trained with Marian to be connected to the rich, wide range of tools available in Python.
arXiv Detail & Related papers (2024-08-15T01:41:21Z) - DyPyBench: A Benchmark of Executable Python Software [18.129031749321058]
We present DyPyBench, the first benchmark of Python projects that is large scale, diverse, ready to run and ready to analyze.
The benchmark encompasses 50 popular opensource projects from various application domains, with a total of 681k lines of Python code, and 30k test cases.
We envision DyPyBench to provide a basis for other dynamic analyses and for studying the runtime behavior of Python code.
arXiv Detail & Related papers (2024-03-01T13:53:15Z) - Advising OpenMP Parallelization via a Graph-Based Approach with
Transformers [2.393682571484038]
We propose a novel approach, called OMPify, to detect and predict the OpenMP pragmas and shared-memory attributes in parallel code.
OMPify is based on a Transformer-based model that leverages a graph-based representation of source code.
Our results demonstrate that OMPify outperforms existing approaches, the general-purposed and popular ChatGPT and targeted PragFormer models.
arXiv Detail & Related papers (2023-05-16T16:56:10Z) - Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures [67.47328776279204]
This work introduces a framework to develop efficient, portable Deep Learning and High Performance Computing kernels.
We decompose the kernel development in two steps: 1) Expressing the computational core using Processing Primitives (TPPs) and 2) Expressing the logical loops around TPPs in a high-level, declarative fashion.
We demonstrate the efficacy of our approach using standalone kernels and end-to-end workloads that outperform state-of-the-art implementations on diverse CPU platforms.
arXiv Detail & Related papers (2023-04-25T05:04:44Z) - QParallel: Explicit Parallelism for Programming Quantum Computers [62.10004571940546]
We present a language extension for parallel quantum programming.
QParallel removes ambiguities concerning parallelism in current quantum programming languages.
We introduce a tool that guides programmers in the placement of parallel regions by identifying the subroutines that profit most from parallelization.
arXiv Detail & Related papers (2022-10-07T16:35:16Z) - Scikit-dimension: a Python package for intrinsic dimension estimation [58.8599521537]
This technical note introduces textttscikit-dimension, an open-source Python package for intrinsic dimension estimation.
textttscikit-dimension package provides a uniform implementation of most of the known ID estimators based on scikit-learn application programming interface.
We briefly describe the package and demonstrate its use in a large-scale (more than 500 datasets) benchmarking of methods for ID estimation in real-life and synthetic data.
arXiv Detail & Related papers (2021-09-06T16:46:38Z) - Extending Python for Quantum-Classical Computing via Quantum
Just-in-Time Compilation [78.8942067357231]
Python is a popular programming language known for its flexibility, usability, readability, and focus on developer productivity.
We present a language extension to Python that enables heterogeneous quantum-classical computing via a robust C++ infrastructure for quantum just-in-time compilation.
arXiv Detail & Related papers (2021-05-10T21:11:21Z) - Python Workflows on HPC Systems [2.1485350418225244]
The recent successes and wide spread application of compute intensive machine learning and data analytics methods have been boosting the usage of the Python programming language on HPC systems.
While Python provides many advantages for the users, it has not been designed with a focus on multi-user environments or parallel programming.
In this paper, we analyze the key problems induced by the usage of Python on HPC clusters and sketch appropriate workarounds.
arXiv Detail & Related papers (2020-12-01T09:51:12Z) - OPFython: A Python-Inspired Optimum-Path Forest Classifier [68.8204255655161]
This paper proposes a Python-based Optimum-Path Forest framework, denoted as OPFython.
As OPFython is a Python-based library, it provides a more friendly environment and a faster prototyping workspace than the C language.
arXiv Detail & Related papers (2020-01-28T15:46:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.