AUTOPARLLM: GNN-Guided Automatic Code Parallelization using Large
Language Models
- URL: http://arxiv.org/abs/2310.04047v2
- Date: Mon, 9 Oct 2023 02:35:47 GMT
- Title: AUTOPARLLM: GNN-Guided Automatic Code Parallelization using Large
Language Models
- Authors: Quazi Ishtiaque Mahmud, Ali TehraniJamsaz, Hung D Phan, Nesreen K.
Ahmed and Ali Jannesari
- Abstract summary: AUTOPARLLM is a framework for automatically discovering parallelism and generating the parallel version of sequential written programs.
We evaluate it on 11 applications of 2 well-known benchmark suites: NAS Parallel Benchmark and Rodinia Benchmark.
Our results show that AUTOPARLLM is indeed effective in improving the state-of-the-art LLM-based models for the task of parallel code generation.
- Score: 13.514916184776107
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Parallelizing sequentially written programs is a challenging task. Even
experienced developers need to spend considerable time finding parallelism
opportunities and then actually writing parallel versions of sequentially
written programs. To address this issue, we present AUTOPARLLM, a framework for
automatically discovering parallelism and generating the parallel version of
the sequentially written program. Our framework consists of two major
components: i) a heterogeneous Graph Neural Network (GNN) based parallelism
discovery and parallel pattern detection module, and ii) an LLM-based code
generator to generate the parallel counterpart of the sequential programs. We
use the GNN to learn the flow-aware characteristics of the programs to identify
parallel regions in sequential programs and then construct an enhanced prompt
using the GNN's results for the LLM-based generator to finally produce the
parallel counterparts of the sequential programs. We evaluate AUTOPARLLM on 11
applications of 2 well-known benchmark suites: NAS Parallel Benchmark and
Rodinia Benchmark. Our results show that AUTOPARLLM is indeed effective in
improving the state-of-the-art LLM-based models for the task of parallel code
generation in terms of multiple code generation metrics. AUTOPARLLM also
improves the average runtime of the parallel code generated by the
state-of-the-art LLMs by as high as 3.4% and 2.9% for the NAS Parallel
Benchmark and Rodinia Benchmark respectively. Additionally, to overcome the
issue that well-known metrics for translation evaluation have not been
optimized to evaluate the quality of the generated parallel code, we propose
OMPScore for evaluating the quality of the generated code. We show that
OMPScore exhibits a better correlation with human judgment than existing
metrics, measured by up to 75% improvement of Spearman correlation.
Related papers
- Changing Base Without Losing Pace: A GPU-Efficient Alternative to MatMul in DNNs [1.8911962184174564]
We propose a cheaper alternative bilinear operator to matrix-multiplication in deep neural networks (DNNs)
We show that replacing emphall linear layers with STL and training from scratch, results in factor x2.7 reduction in FLOPs with a 0.5 emphaccuracy improvement.
Finetuning TinyLlama citetinyllama24 with STL layers on the Slim Pajama dataset, achieves similar accuracy to 2:4, with x2.2 FLOP speedup compared to x1.7 of the latter.
arXiv Detail & Related papers (2025-03-15T17:31:36Z) - COrAL: Order-Agnostic Language Modeling for Efficient Iterative Refinement [80.18490952057125]
Iterative refinement has emerged as an effective paradigm for enhancing the capabilities of large language models (LLMs) on complex tasks.
We propose Context-Wise Order-Agnostic Language Modeling (COrAL) to overcome these challenges.
Our approach models multiple token dependencies within manageable context windows, enabling the model to perform iterative refinement internally.
arXiv Detail & Related papers (2024-10-12T23:56:19Z) - ParallelSpec: Parallel Drafter for Efficient Speculative Decoding [62.68430939686566]
We present ParallelSpec, an alternative to auto-regressive drafting strategies in state-of-the-art speculative decoding approaches.
In contrast to auto-regressive drafting in the speculative stage, we train a parallel drafter to serve as an efficient speculative model.
arXiv Detail & Related papers (2024-10-08T01:05:08Z) - OMPar: Automatic Parallelization with AI-Driven Source-to-Source Compilation [4.266086505323998]
This paper introduces OMPar, an AI-driven tool designed to automate the parallelization of C/C++ code using OpenMP pragmas.
OMPar integrates Large Language Models (LLMs) through two key components: OMPify, which assesses loop parallelization potential, and MonoCoder-OMP, a new fine-tuned model which generates precise OpenMP pragmas.
arXiv Detail & Related papers (2024-09-23T07:39:01Z) - Generating Unseen Code Tests In Infinitum [1.0674604700001968]
We present a method for creating benchmark variations that generalize across coding tasks and programming languages.
We implement one benchmark, called textitauto-regression, for the task of text-to-code generation in Python.
arXiv Detail & Related papers (2024-07-29T08:11:20Z) - SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices [18.81344021150902]
SpecExec is a simple parallel decoding method that can generate up to 20 tokens per target model iteration for popular LLM families.
We demonstrate inference of 50B+ parameter LLMs on consumer GPUs with RAM offloading at 4-6 tokens per second with 4-bit quantization or 2-3 tokens per second with 16-bit weights.
arXiv Detail & Related papers (2024-06-04T17:53:36Z) - Nearest Neighbor Speculative Decoding for LLM Generation and Attribution [87.3259169631789]
Nearest Speculative Decoding (NEST) is capable of incorporating real-world text spans of arbitrary length into the LM generations and providing attribution to their sources.
NEST significantly enhances the generation quality and attribution rate of the base LM across a variety of knowledge-intensive tasks.
In addition, NEST substantially improves the generation speed, achieving a 1.8x speedup in inference time when applied to Llama-2-Chat 70B.
arXiv Detail & Related papers (2024-05-29T17:55:03Z) - Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference [19.167604927651073]
Auto-regressive decoding of Large Language Models (LLMs) results in significant overheads in their hardware performance.
We propose a novel parallel prompt decoding that requires only $0.0002$% trainable parameters, enabling efficient training on a single A100-40GB GPU in just 16 hours.
Our approach demonstrates up to 2.49$times$ speedup and maintains a minimal memory overhead of just $0.0004$%.
arXiv Detail & Related papers (2024-05-28T22:19:30Z) - MPIrigen: MPI Code Generation through Domain-Specific Language Models [3.5352856644774806]
This study first investigates the performance of state-of-the-art language models in generating MPI-based parallel programs.
We introduce a dedicated downstream task of MPI-based program generation by fine-tuning MonoCoder on HPCorpusMPI.
The success of this tailored solution underscores the importance of domain-specific fine-tuning in optimizing language models for parallel computing code generation.
arXiv Detail & Related papers (2024-02-14T12:24:21Z) - ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code [76.84199699772903]
ML-Bench is a benchmark rooted in real-world programming applications that leverage existing code repositories to perform tasks.
To evaluate both Large Language Models (LLMs) and AI agents, two setups are employed: ML-LLM-Bench for assessing LLMs' text-to-code conversion within a predefined deployment environment, and ML-Agent-Bench for testing autonomous agents in an end-to-end task execution within a Linux sandbox environment.
arXiv Detail & Related papers (2023-11-16T12:03:21Z) - Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads to Answers Faster [61.83949316226113]
FastCoT is a model-agnostic framework based on parallel decoding.
We show that FastCoT saves inference time by nearly 20% with only a negligible performance drop compared to the regular approach.
arXiv Detail & Related papers (2023-11-14T15:56:18Z) - Retrieval meets Long Context Large Language Models [59.431200671427064]
Extending context window of large language models (LLMs) is getting popular recently.
Retrieval-augmentation versus long context window, which one is better for downstream tasks?
Can both methods be combined to get the best of both worlds?
Our best model, retrieval-augmented Llama2-70B with 32K context window, outperforms GPT-3.5-turbo-16k and Davinci003 in terms of average score on nine long context tasks.
arXiv Detail & Related papers (2023-10-04T17:59:41Z) - Advising OpenMP Parallelization via a Graph-Based Approach with
Transformers [2.393682571484038]
We propose a novel approach, called OMPify, to detect and predict the OpenMP pragmas and shared-memory attributes in parallel code.
OMPify is based on a Transformer-based model that leverages a graph-based representation of source code.
Our results demonstrate that OMPify outperforms existing approaches, the general-purposed and popular ChatGPT and targeted PragFormer models.
arXiv Detail & Related papers (2023-05-16T16:56:10Z) - Learning to Parallelize with OpenMP by Augmented Heterogeneous AST
Representation [7.750212995537728]
We propose a novel graph-based learning approach called Graph2Par that utilizes a heterogeneous augmented abstract syntax tree (Augmented-AST) representation for code.
We create an OMP_Serial dataset with 18598 parallelizable and 13972 non-parallelizable loops to train the machine learning models.
Our results show that our proposed approach achieves the accuracy of parallelizable code region detection with 85% accuracy and outperforms the state-of-the-art token-based machine learning approach.
arXiv Detail & Related papers (2023-05-09T21:57:15Z) - Inference with Reference: Lossless Acceleration of Large Language Models [97.04200102556551]
LLMA is an accelerator to speed up Large Language Model (LLM) inference with references.
It is motivated by the observation that there are abundant identical text spans between the decoding result by an LLM and the reference that is available in many real world scenarios.
arXiv Detail & Related papers (2023-04-10T09:55:14Z) - A Robust Semantic Frame Parsing Pipeline on a New Complex Twitter
Dataset [53.73316523766183]
We introduce a robust semantic frame parsing pipeline that can handle both emphOOD patterns and emphOOV tokens.
We also build an E2E application to demo the feasibility of our algorithm and show why it is useful in real application.
arXiv Detail & Related papers (2022-12-18T01:59:49Z) - Simplifying and Understanding State Space Models with Diagonal Linear
RNNs [56.33053691749856]
This work disposes of the discretization step, and proposes a model based on vanilla Diagonal Linear RNNs.
We empirically show that, despite being conceptually much simpler, $mathrmDLR$ is as performant as previously-proposed SSMs.
We also characterize the expressivity of SSMs and attention-based models via a suite of $13$ synthetic sequence-to-sequence tasks.
arXiv Detail & Related papers (2022-12-01T18:53:06Z) - QParallel: Explicit Parallelism for Programming Quantum Computers [62.10004571940546]
We present a language extension for parallel quantum programming.
QParallel removes ambiguities concerning parallelism in current quantum programming languages.
We introduce a tool that guides programmers in the placement of parallel regions by identifying the subroutines that profit most from parallelization.
arXiv Detail & Related papers (2022-10-07T16:35:16Z) - MPLP++: Fast, Parallel Dual Block-Coordinate Ascent for Dense Graphical
Models [96.1052289276254]
This work introduces a new MAP-solver, based on the popular Dual Block-Coordinate Ascent principle.
Surprisingly, by making a small change to the low-performing solver, we derive the new solver MPLP++ that significantly outperforms all existing solvers by a large margin.
arXiv Detail & Related papers (2020-04-16T16:20:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.