Learning to Parallelize in a Shared-Memory Environment with Transformers
- URL: http://arxiv.org/abs/2204.12835v1
- Date: Wed, 27 Apr 2022 10:39:52 GMT
- Title: Learning to Parallelize in a Shared-Memory Environment with Transformers
- Authors: Re'em Harel, Yuval Pinter, Gal Oren
- Abstract summary: OpenMP is the most comprehensive API that implements shared memory parallelization schemes.
Many source-to-source (S2S) compilers have been created over the years, tasked with inserting OpenMP directives into code automatically.
In this work, we propose leveraging recent advances in ML techniques, specifically in natural language processing (NLP), to replace S2S compilers altogether.
- Score: 3.340971990034025
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In past years, the world has switched to many-core and multi-core shared
memory architectures.
As a result, there is a growing need to utilize these architectures by
introducing shared memory parallelization schemes to software applications.
OpenMP is the most comprehensive API that implements such schemes,
characterized by a readable interface. Nevertheless, introducing OpenMP into
code is challenging due to pervasive pitfalls in management of parallel shared
memory. To facilitate the performance of this task, many source-to-source (S2S)
compilers have been created over the years, tasked with inserting OpenMP
directives into code automatically.
In addition to having limited robustness to their input format, these
compilers still do not achieve satisfactory coverage and precision in locating
parallelizable code and generating appropriate directives.
In this work, we propose leveraging recent advances in ML techniques,
specifically in natural language processing (NLP), to replace S2S compilers
altogether.
We create a database (corpus), Open-OMP, specifically for this goal. Open-OMP
contains over 28,000 code snippets, half of which contain OpenMP directives
while the other half do not need parallelization at all with high probability.
We use the corpus to train systems to automatically classify code segments in
need of parallelization, as well as suggest individual OpenMP clauses.
We train several transformer models, named PragFormer, for these tasks, and
show that they outperform statistically-trained baselines and automatic S2S
parallelization compilers in both classifying the overall need for an OpenMP
directive and the introduction of private and reduction clauses.
Our source code and database are available at:
https://github.com/Scientific-Computing-Lab-NRCN/PragFormer.
Related papers
- OMPar: Automatic Parallelization with AI-Driven Source-to-Source Compilation [4.266086505323998]
This paper introduces OMPar, an AI-driven tool designed to automate the parallelization of C/C++ code using OpenMP pragmas.
OMPar integrates Large Language Models (LLMs) through two key components: OMPify, which assesses loop parallelization potential, and MonoCoder-OMP, a new fine-tuned model which generates precise OpenMP pragmas.
arXiv Detail & Related papers (2024-09-23T07:39:01Z) - Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs [61.40047491337793]
We present Hierarchical cOntext MERging (HOMER), a new training-free scheme designed to overcome the limitations of large language models.
HomeR uses a divide-and-conquer algorithm, dividing long inputs into manageable chunks.
A token reduction technique precedes each merging, ensuring memory usage efficiency.
arXiv Detail & Related papers (2024-04-16T06:34:08Z) - MPIrigen: MPI Code Generation through Domain-Specific Language Models [3.5352856644774806]
This study first investigates the performance of state-of-the-art language models in generating MPI-based parallel programs.
We introduce a dedicated downstream task of MPI-based program generation by fine-tuning MonoCoder on HPCorpusMPI.
The success of this tailored solution underscores the importance of domain-specific fine-tuning in optimizing language models for parallel computing code generation.
arXiv Detail & Related papers (2024-02-14T12:24:21Z) - Extreme Compression of Large Language Models via Additive Quantization [59.3122859349777]
Our algorithm, called AQLM, generalizes the classic Additive Quantization (AQ) approach for information retrieval.
We provide fast GPU and CPU implementations of AQLM for token generation, which enable us to match or outperform optimized FP16 implementations for speed.
arXiv Detail & Related papers (2024-01-11T18:54:44Z) - L2MAC: Large Language Model Automatic Computer for Extensive Code Generation [52.81694565226513]
Transformer-based large language models (LLMs) are constrained by the fixed context window of the underlying transformer architecture.
This paper presents L2MAC, the first practical LLM-based general-purpose stored-program automatic computer (von Neumann architecture) framework, for long and consistent output generation.
arXiv Detail & Related papers (2023-10-02T16:55:19Z) - Exploring Continual Learning for Code Generation Models [80.78036093054855]
Continual Learning (CL) is an important aspect that remains underexplored in the code domain.
We introduce a benchmark called CodeTask-CL that covers a wide range of tasks, including code generation, translation, summarization, and refinement.
We find that effective methods like Prompt Pooling (PP) suffer from catastrophic forgetting due to the unstable training of the prompt selection mechanism.
arXiv Detail & Related papers (2023-07-05T16:58:39Z) - Advising OpenMP Parallelization via a Graph-Based Approach with
Transformers [2.393682571484038]
We propose a novel approach, called OMPify, to detect and predict the OpenMP pragmas and shared-memory attributes in parallel code.
OMPify is based on a Transformer-based model that leverages a graph-based representation of source code.
Our results demonstrate that OMPify outperforms existing approaches, the general-purposed and popular ChatGPT and targeted PragFormer models.
arXiv Detail & Related papers (2023-05-16T16:56:10Z) - MPI-rical: Data-Driven MPI Distributed Parallelism Assistance with
Transformers [3.2164100882807913]
Message Passing Interface (MPI) plays a crucial role in distributed memory parallelization across multiple nodes.
We develop MPI-RICAL, a data-driven programming-assistance tool that assists programmers in writing domain decomposition based distributed memory parallelization code.
We also introduce MPICodeCorpus, the first publicly available corpus of MPI-based parallel programs that is created by mining more than 15,000 open-source repositories on GitHub.
arXiv Detail & Related papers (2023-05-16T13:50:24Z) - Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures [67.47328776279204]
This work introduces a framework to develop efficient, portable Deep Learning and High Performance Computing kernels.
We decompose the kernel development in two steps: 1) Expressing the computational core using Processing Primitives (TPPs) and 2) Expressing the logical loops around TPPs in a high-level, declarative fashion.
We demonstrate the efficacy of our approach using standalone kernels and end-to-end workloads that outperform state-of-the-art implementations on diverse CPU platforms.
arXiv Detail & Related papers (2023-04-25T05:04:44Z) - QParallel: Explicit Parallelism for Programming Quantum Computers [62.10004571940546]
We present a language extension for parallel quantum programming.
QParallel removes ambiguities concerning parallelism in current quantum programming languages.
We introduce a tool that guides programmers in the placement of parallel regions by identifying the subroutines that profit most from parallelization.
arXiv Detail & Related papers (2022-10-07T16:35:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.