Learning to Parallelize with OpenMP by Augmented Heterogeneous AST
Representation
- URL: http://arxiv.org/abs/2305.05779v1
- Date: Tue, 9 May 2023 21:57:15 GMT
- Title: Learning to Parallelize with OpenMP by Augmented Heterogeneous AST
Representation
- Authors: Le Chen, Quazi Ishtiaque Mahmud, Hung Phan, Nesreen K. Ahmed, Ali
Jannesari
- Abstract summary: We propose a novel graph-based learning approach called Graph2Par that utilizes a heterogeneous augmented abstract syntax tree (Augmented-AST) representation for code.
We create an OMP_Serial dataset with 18598 parallelizable and 13972 non-parallelizable loops to train the machine learning models.
Our results show that our proposed approach achieves the accuracy of parallelizable code region detection with 85% accuracy and outperforms the state-of-the-art token-based machine learning approach.
- Score: 7.750212995537728
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Detecting parallelizable code regions is a challenging task, even for
experienced developers. Numerous recent studies have explored the use of
machine learning for code analysis and program synthesis, including
parallelization, in light of the success of machine learning in natural
language processing. However, applying machine learning techniques to
parallelism detection presents several challenges, such as the lack of an
adequate dataset for training, an effective code representation with rich
information, and a suitable machine learning model to learn the latent features
of code for diverse analyses. To address these challenges, we propose a novel
graph-based learning approach called Graph2Par that utilizes a heterogeneous
augmented abstract syntax tree (Augmented-AST) representation for code. The
proposed approach primarily focused on loop-level parallelization with OpenMP.
Moreover, we create an OMP\_Serial dataset with 18598 parallelizable and 13972
non-parallelizable loops to train the machine learning models. Our results show
that our proposed approach achieves the accuracy of parallelizable code region
detection with 85\% accuracy and outperforms the state-of-the-art token-based
machine learning approach. These results indicate that our approach is
competitive with state-of-the-art tools and capable of handling loops with
complex structures that other tools may overlook.
Related papers
- OMPar: Automatic Parallelization with AI-Driven Source-to-Source Compilation [4.266086505323998]
This paper introduces OMPar, an AI-driven tool designed to automate the parallelization of C/C++ code using OpenMP pragmas.
OMPar integrates Large Language Models (LLMs) through two key components: OMPify, which assesses loop parallelization potential, and MonoCoder-OMP, a new fine-tuned model which generates precise OpenMP pragmas.
arXiv Detail & Related papers (2024-09-23T07:39:01Z) - Topological Methods in Machine Learning: A Tutorial for Practitioners [4.297070083645049]
Topological Machine Learning (TML) is an emerging field that leverages techniques from algebraic topology to analyze complex data structures.
This tutorial provides a comprehensive introduction to two key TML techniques, persistent homology and the Mapper algorithm.
To enhance accessibility, we adopt a data-centric approach, enabling readers to gain hands-on experience applying these techniques to relevant tasks.
arXiv Detail & Related papers (2024-09-04T17:44:52Z) - From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models [63.188607839223046]
This survey focuses on the benefits of scaling compute during inference.
We explore three areas under a unified mathematical formalism: token-level generation algorithms, meta-generation algorithms, and efficient generation.
arXiv Detail & Related papers (2024-06-24T17:45:59Z) - Provably Efficient Representation Learning with Tractable Planning in
Low-Rank POMDP [81.00800920928621]
We study representation learning in partially observable Markov Decision Processes (POMDPs)
We first present an algorithm for decodable POMDPs that combines maximum likelihood estimation (MLE) and optimism in the face of uncertainty (OFU)
We then show how to adapt this algorithm to also work in the broader class of $gamma$-observable POMDPs.
arXiv Detail & Related papers (2023-06-21T16:04:03Z) - Advising OpenMP Parallelization via a Graph-Based Approach with
Transformers [2.393682571484038]
We propose a novel approach, called OMPify, to detect and predict the OpenMP pragmas and shared-memory attributes in parallel code.
OMPify is based on a Transformer-based model that leverages a graph-based representation of source code.
Our results demonstrate that OMPify outperforms existing approaches, the general-purposed and popular ChatGPT and targeted PragFormer models.
arXiv Detail & Related papers (2023-05-16T16:56:10Z) - ParaGraph: Weighted Graph Representation for Performance Optimization of
HPC Kernels [1.304892050913381]
We introduce a new graph-based program representation for parallel applications that extends the Abstract Syntax Tree.
We evaluate our proposed representation by training a Graph Neural Network (GNN) to predict the runtime of an OpenMP code region.
Results show that our approach is indeed effective and has normalized RMSE as low as 0.004 to at most 0.01 in its runtime predictions.
arXiv Detail & Related papers (2023-04-07T05:52:59Z) - GPU-Accelerated Machine Learning in Non-Orthogonal Multiple Access [71.58925117604039]
Non-orthogonal multiple access (NOMA) is an interesting technology that enables massive connectivity as required in future 5G and 6G networks.
We propose a neural network architecture that combines the advantages of both linear and non-linear processing.
arXiv Detail & Related papers (2022-06-13T09:38:23Z) - Exploring Techniques for the Analysis of Spontaneous Asynchronicity in
MPI-Parallel Applications [0.8889304968879161]
We run microbenchmarks and realistic proxy applications with the regular compute-communicate structure on two different supercomputing platforms.
We show how desynchronization patterns can be readily identified from a data set that is much smaller than a full MPI trace.
arXiv Detail & Related papers (2022-05-27T13:19:07Z) - SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines.
This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z) - A Survey on Large-scale Machine Learning [67.6997613600942]
Machine learning can provide deep insights into data, allowing machines to make high-quality predictions.
Most sophisticated machine learning approaches suffer from huge time costs when operating on large-scale data.
Large-scale Machine Learning aims to learn patterns from big data with comparable performance efficiently.
arXiv Detail & Related papers (2020-08-10T06:07:52Z) - Synthetic Datasets for Neural Program Synthesis [66.20924952964117]
We propose a new methodology for controlling and evaluating the bias of synthetic data distributions over both programs and specifications.
We demonstrate, using the Karel DSL and a small Calculator DSL, that training deep networks on these distributions leads to improved cross-distribution generalization performance.
arXiv Detail & Related papers (2019-12-27T21:28:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.