Evaluation of OpenAI Codex for HPC Parallel Programming Models Kernel
Generation
- URL: http://arxiv.org/abs/2306.15121v1
- Date: Tue, 27 Jun 2023 00:11:31 GMT
- Title: Evaluation of OpenAI Codex for HPC Parallel Programming Models Kernel
Generation
- Authors: William F. Godoy, Pedro Valero-Lara, Keita Teranishi, Prasanna
Balaprakash, Jeffrey S. Vetter
- Abstract summary: We evaluate AI-assisted generative capabilities on fundamental numerical kernels in high-performance computing.
We test the generated kernel codes for a variety of language-supported programming models.
We propose a proficiency metric around the initial 10 suggestions given for each prompt.
- Score: 1.7646846505225735
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We evaluate AI-assisted generative capabilities on fundamental numerical
kernels in high-performance computing (HPC), including AXPY, GEMV, GEMM, SpMV,
Jacobi Stencil, and CG. We test the generated kernel codes for a variety of
language-supported programming models, including (1) C++ (e.g., OpenMP
[including offload], OpenACC, Kokkos, SyCL, CUDA, and HIP), (2) Fortran (e.g.,
OpenMP [including offload] and OpenACC), (3) Python (e.g., numba, Numba, cuPy,
and pyCUDA), and (4) Julia (e.g., Threads, CUDA.jl, AMDGPU.jl, and
KernelAbstractions.jl). We use the GitHub Copilot capabilities powered by
OpenAI Codex available in Visual Studio Code as of April 2023 to generate a
vast amount of implementations given simple <kernel> + <programming model> +
<optional hints> prompt variants. To quantify and compare the results, we
propose a proficiency metric around the initial 10 suggestions given for each
prompt. Results suggest that the OpenAI Codex outputs for C++ correlate with
the adoption and maturity of programming models. For example, OpenMP and CUDA
score really high, whereas HIP is still lacking. We found that prompts from
either a targeted language such as Fortran or the more general-purpose Python
can benefit from adding code keywords, while Julia prompts perform acceptably
well for its mature programming models (e.g., Threads and CUDA.jl). We expect
for these benchmarks to provide a point of reference for each programming
model's community. Overall, understanding the convergence of large language
models, AI, and HPC is crucial due to its rapidly evolving nature and how it is
redefining human-computer interactions.
Related papers
- Learning to Reason via Program Generation, Emulation, and Search [33.11955431589091]
Program synthesis with language models (LMs) has unlocked a large set of reasoning abilities.
Not all reasoning tasks are easily expressible as code, e.g. tasks involving commonsense reasoning, moral decision-making, and sarcasm understanding.
We propose Code Generation and Emulated EXecution (CoGEX) to extend an LM's program synthesis skills to such tasks.
arXiv Detail & Related papers (2024-05-25T19:40:50Z) - Evaluating AI-generated code for C++, Fortran, Go, Java, Julia, Matlab, Python, R, and Rust [0.1906498126334485]
This study evaluates the capabilities of ChatGPT versions 3.5 and 4 in generating code across a diverse range of programming languages.
We asked ChatGPT to generate three distinct codes: a simple numerical integration, a conjugate gradient solver, and a parallel 1D stencil-based heat equation solver.
The focus of our analysis was on the compilation, runtime performance, and accuracy of the codes.
arXiv Detail & Related papers (2024-05-21T17:04:37Z) - Comparing Llama-2 and GPT-3 LLMs for HPC kernels generation [3.070523453466106]
We evaluate the use of the open-source Llama-2 model for generating high-performance computing kernels.
Our goal is to compare the accuracy of Llama-2 and our original GPT-3 baseline by using a similar metric.
arXiv Detail & Related papers (2023-09-12T01:19:54Z) - Creating a Dataset for High-Performance Computing Code Translation using
LLMs: A Bridge Between OpenMP Fortran and C++ [7.872005563259838]
The effectiveness of our dataset is assessed using both quantitative (CodeBLEU) and qualitative (human evaluation) methods.
Models without prior coding knowledge experienced a boost of $mathbftimes5.1$ in CodeBLEU scores.
Models with some coding familiarity saw an impressive $mathbftimes9.9$-fold increase.
arXiv Detail & Related papers (2023-07-15T02:35:51Z) - Exploring Continual Learning for Code Generation Models [80.78036093054855]
Continual Learning (CL) is an important aspect that remains underexplored in the code domain.
We introduce a benchmark called CodeTask-CL that covers a wide range of tasks, including code generation, translation, summarization, and refinement.
We find that effective methods like Prompt Pooling (PP) suffer from catastrophic forgetting due to the unstable training of the prompt selection mechanism.
arXiv Detail & Related papers (2023-07-05T16:58:39Z) - Advising OpenMP Parallelization via a Graph-Based Approach with
Transformers [2.393682571484038]
We propose a novel approach, called OMPify, to detect and predict the OpenMP pragmas and shared-memory attributes in parallel code.
OMPify is based on a Transformer-based model that leverages a graph-based representation of source code.
Our results demonstrate that OMPify outperforms existing approaches, the general-purposed and popular ChatGPT and targeted PragFormer models.
arXiv Detail & Related papers (2023-05-16T16:56:10Z) - HDCC: A Hyperdimensional Computing compiler for classification on
embedded systems and high-performance computing [58.720142291102135]
This work introduces the name compiler, the first open-source compiler that translates high-level descriptions of HDC classification methods into optimized C code.
name is designed like a modern compiler, featuring an intuitive and descriptive input language, an intermediate representation (IR), and a retargetable backend.
To substantiate these claims, we conducted experiments with HDCC on several of the most popular datasets in the HDC literature.
arXiv Detail & Related papers (2023-04-24T19:16:03Z) - Measuring Coding Challenge Competence With APPS [54.22600767666257]
We introduce APPS, a benchmark for code generation.
Our benchmark includes 10,000 problems, which range from having simple one-line solutions to being substantial algorithmic challenges.
Recent models such as GPT-Neo can pass approximately 15% of the test cases of introductory problems.
arXiv Detail & Related papers (2021-05-20T17:58:42Z) - Extending Python for Quantum-Classical Computing via Quantum
Just-in-Time Compilation [78.8942067357231]
Python is a popular programming language known for its flexibility, usability, readability, and focus on developer productivity.
We present a language extension to Python that enables heterogeneous quantum-classical computing via a robust C++ infrastructure for quantum just-in-time compilation.
arXiv Detail & Related papers (2021-05-10T21:11:21Z) - Extending C++ for Heterogeneous Quantum-Classical Computing [56.782064931823015]
qcor is a language extension to C++ and compiler implementation that enables heterogeneous quantum-classical programming, compilation, and execution in a single-source context.
Our work provides a first-of-its-kind C++ compiler enabling high-level quantum kernel (function) expression in a quantum-language manner.
arXiv Detail & Related papers (2020-10-08T12:49:07Z) - Hybrid Models for Learning to Branch [81.93868699246214]
We propose a new hybrid architecture for efficient branching on CPU machines.
The proposed architecture combines the expressive power of GNNs with computationally inexpensive multi-layer perceptrons (MLP) for branching.
arXiv Detail & Related papers (2020-06-26T21:03:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.