AUTOPARLLM: GNN-Guided Automatic Code Parallelization using Large
Language Models
- URL: http://arxiv.org/abs/2310.04047v2
- Date: Mon, 9 Oct 2023 02:35:47 GMT
- Title: AUTOPARLLM: GNN-Guided Automatic Code Parallelization using Large
Language Models
- Authors: Quazi Ishtiaque Mahmud, Ali TehraniJamsaz, Hung D Phan, Nesreen K.
Ahmed and Ali Jannesari
- Abstract summary: AUTOPARLLM is a framework for automatically discovering parallelism and generating the parallel version of sequential written programs.
We evaluate it on 11 applications of 2 well-known benchmark suites: NAS Parallel Benchmark and Rodinia Benchmark.
Our results show that AUTOPARLLM is indeed effective in improving the state-of-the-art LLM-based models for the task of parallel code generation.
- Score: 13.514916184776107
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Parallelizing sequentially written programs is a challenging task. Even
experienced developers need to spend considerable time finding parallelism
opportunities and then actually writing parallel versions of sequentially
written programs. To address this issue, we present AUTOPARLLM, a framework for
automatically discovering parallelism and generating the parallel version of
the sequentially written program. Our framework consists of two major
components: i) a heterogeneous Graph Neural Network (GNN) based parallelism
discovery and parallel pattern detection module, and ii) an LLM-based code
generator to generate the parallel counterpart of the sequential programs. We
use the GNN to learn the flow-aware characteristics of the programs to identify
parallel regions in sequential programs and then construct an enhanced prompt
using the GNN's results for the LLM-based generator to finally produce the
parallel counterparts of the sequential programs. We evaluate AUTOPARLLM on 11
applications of 2 well-known benchmark suites: NAS Parallel Benchmark and
Rodinia Benchmark. Our results show that AUTOPARLLM is indeed effective in
improving the state-of-the-art LLM-based models for the task of parallel code
generation in terms of multiple code generation metrics. AUTOPARLLM also
improves the average runtime of the parallel code generated by the
state-of-the-art LLMs by as high as 3.4% and 2.9% for the NAS Parallel
Benchmark and Rodinia Benchmark respectively. Additionally, to overcome the
issue that well-known metrics for translation evaluation have not been
optimized to evaluate the quality of the generated parallel code, we propose
OMPScore for evaluating the quality of the generated code. We show that
OMPScore exhibits a better correlation with human judgment than existing
metrics, measured by up to 75% improvement of Spearman correlation.
Related papers
- LLM-Supported Natural Language to Bash Translation [3.944966059637878]
We present a novel functional equivalence that combines command execution with evaluation of command outputs.
We show that parsing, in-context learning, in-weight learning, and constrained decoding can improve NL2SH accuracy by up to 32%.
arXiv Detail & Related papers (2025-02-07T19:35:55Z) - Generating Unseen Code Tests In Infinitum [1.0674604700001968]
We present a method for creating benchmark variations that generalize across coding tasks and programming languages.
We implement one benchmark, called textitauto-regression, for the task of text-to-code generation in Python.
arXiv Detail & Related papers (2024-07-29T08:11:20Z) - SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices [18.81344021150902]
SpecExec is a simple parallel decoding method that can generate up to 20 tokens per target model iteration for popular LLM families.
We demonstrate inference of 50B+ parameter LLMs on consumer GPUs with RAM offloading at 4-6 tokens per second with 4-bit quantization or 2-3 tokens per second with 16-bit weights.
arXiv Detail & Related papers (2024-06-04T17:53:36Z) - Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference [19.167604927651073]
Auto-regressive decoding of Large Language Models (LLMs) results in significant overheads in their hardware performance.
We propose a novel parallel prompt decoding that requires only $0.0002$% trainable parameters, enabling efficient training on a single A100-40GB GPU in just 16 hours.
Our approach demonstrates up to 2.49$times$ speedup and maintains a minimal memory overhead of just $0.0004$%.
arXiv Detail & Related papers (2024-05-28T22:19:30Z) - Linear-time Minimum Bayes Risk Decoding with Reference Aggregation [52.1701152610258]
Minimum Bayes Risk (MBR) decoding is a text generation technique that has been shown to improve the quality of machine translations.
It requires the pairwise calculation of a utility metric, which has quadratic complexity.
We propose to approximate pairwise metric scores with scores calculated against aggregated reference representations.
arXiv Detail & Related papers (2024-02-06T18:59:30Z) - ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code [76.84199699772903]
ML-Bench is a benchmark rooted in real-world programming applications that leverage existing code repositories to perform tasks.
To evaluate both Large Language Models (LLMs) and AI agents, two setups are employed: ML-LLM-Bench for assessing LLMs' text-to-code conversion within a predefined deployment environment, and ML-Agent-Bench for testing autonomous agents in an end-to-end task execution within a Linux sandbox environment.
arXiv Detail & Related papers (2023-11-16T12:03:21Z) - Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads to Answers Faster [61.83949316226113]
FastCoT is a model-agnostic framework based on parallel decoding.
We show that FastCoT saves inference time by nearly 20% with only a negligible performance drop compared to the regular approach.
arXiv Detail & Related papers (2023-11-14T15:56:18Z) - Retrieval meets Long Context Large Language Models [59.431200671427064]
Extending context window of large language models (LLMs) is getting popular recently.
Retrieval-augmentation versus long context window, which one is better for downstream tasks?
Can both methods be combined to get the best of both worlds?
Our best model, retrieval-augmented Llama2-70B with 32K context window, outperforms GPT-3.5-turbo-16k and Davinci003 in terms of average score on nine long context tasks.
arXiv Detail & Related papers (2023-10-04T17:59:41Z) - A Robust Semantic Frame Parsing Pipeline on a New Complex Twitter
Dataset [53.73316523766183]
We introduce a robust semantic frame parsing pipeline that can handle both emphOOD patterns and emphOOV tokens.
We also build an E2E application to demo the feasibility of our algorithm and show why it is useful in real application.
arXiv Detail & Related papers (2022-12-18T01:59:49Z) - Simplifying and Understanding State Space Models with Diagonal Linear
RNNs [56.33053691749856]
This work disposes of the discretization step, and proposes a model based on vanilla Diagonal Linear RNNs.
We empirically show that, despite being conceptually much simpler, $mathrmDLR$ is as performant as previously-proposed SSMs.
We also characterize the expressivity of SSMs and attention-based models via a suite of $13$ synthetic sequence-to-sequence tasks.
arXiv Detail & Related papers (2022-12-01T18:53:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.