Comparing Llama-2 and GPT-3 LLMs for HPC kernels generation
- URL: http://arxiv.org/abs/2309.07103v1
- Date: Tue, 12 Sep 2023 01:19:54 GMT
- Title: Comparing Llama-2 and GPT-3 LLMs for HPC kernels generation
- Authors: Pedro Valero-Lara, Alexis Huante, Mustafa Al Lail, William F. Godoy,
Keita Teranishi, Prasanna Balaprakash, Jeffrey S. Vetter
- Abstract summary: We evaluate the use of the open-source Llama-2 model for generating high-performance computing kernels.
Our goal is to compare the accuracy of Llama-2 and our original GPT-3 baseline by using a similar metric.
- Score: 3.070523453466106
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We evaluate the use of the open-source Llama-2 model for generating
well-known, high-performance computing kernels (e.g., AXPY, GEMV, GEMM) on
different parallel programming models and languages (e.g., C++: OpenMP, OpenMP
Offload, OpenACC, CUDA, HIP; Fortran: OpenMP, OpenMP Offload, OpenACC; Python:
numpy, Numba, pyCUDA, cuPy; and Julia: Threads, CUDA.jl, AMDGPU.jl). We built
upon our previous work that is based on the OpenAI Codex, which is a descendant
of GPT-3, to generate similar kernels with simple prompts via GitHub Copilot.
Our goal is to compare the accuracy of Llama-2 and our original GPT-3 baseline
by using a similar metric. Llama-2 has a simplified model that shows
competitive or even superior accuracy. We also report on the differences
between these foundational large language models as generative AI continues to
redefine human-computer interactions. Overall, Copilot generates codes that are
more reliable but less optimized, whereas codes generated by Llama-2 are less
reliable but more optimized when correct.
Related papers
- Evaluating AI-generated code for C++, Fortran, Go, Java, Julia, Matlab, Python, R, and Rust [0.1906498126334485]
This study evaluates the capabilities of ChatGPT versions 3.5 and 4 in generating code across a diverse range of programming languages.
We asked ChatGPT to generate three distinct codes: a simple numerical integration, a conjugate gradient solver, and a parallel 1D stencil-based heat equation solver.
The focus of our analysis was on the compilation, runtime performance, and accuracy of the codes.
arXiv Detail & Related papers (2024-05-21T17:04:37Z) - MPIrigen: MPI Code Generation through Domain-Specific Language Models [3.5352856644774806]
This study first investigates the performance of state-of-the-art language models in generating MPI-based parallel programs.
We introduce a dedicated downstream task of MPI-based program generation by fine-tuning MonoCoder on HPCorpusMPI.
The success of this tailored solution underscores the importance of domain-specific fine-tuning in optimizing language models for parallel computing code generation.
arXiv Detail & Related papers (2024-02-14T12:24:21Z) - Linear-time Minimum Bayes Risk Decoding with Reference Aggregation [52.1701152610258]
Minimum Bayes Risk (MBR) decoding is a text generation technique that has been shown to improve the quality of machine translations.
It requires the pairwise calculation of a utility metric, which has quadratic complexity.
We propose to approximate pairwise metric scores with scores calculated against aggregated reference representations.
arXiv Detail & Related papers (2024-02-06T18:59:30Z) - INR-Arch: A Dataflow Architecture and Compiler for Arbitrary-Order
Gradient Computations in Implicit Neural Representation Processing [66.00729477511219]
Given a function represented as a computation graph, traditional architectures face challenges in efficiently computing its nth-order gradient.
We introduce INR-Arch, a framework that transforms the computation graph of an nth-order gradient into a hardware-optimized dataflow architecture.
We present results that demonstrate 1.8-4.8x and 1.5-3.6x speedup compared to CPU and GPU baselines respectively.
arXiv Detail & Related papers (2023-08-11T04:24:39Z) - Creating a Dataset for High-Performance Computing Code Translation using
LLMs: A Bridge Between OpenMP Fortran and C++ [7.872005563259838]
The effectiveness of our dataset is assessed using both quantitative (CodeBLEU) and qualitative (human evaluation) methods.
Models without prior coding knowledge experienced a boost of $mathbftimes5.1$ in CodeBLEU scores.
Models with some coding familiarity saw an impressive $mathbftimes9.9$-fold increase.
arXiv Detail & Related papers (2023-07-15T02:35:51Z) - Evaluation of OpenAI Codex for HPC Parallel Programming Models Kernel
Generation [1.7646846505225735]
We evaluate AI-assisted generative capabilities on fundamental numerical kernels in high-performance computing.
We test the generated kernel codes for a variety of language-supported programming models.
We propose a proficiency metric around the initial 10 suggestions given for each prompt.
arXiv Detail & Related papers (2023-06-27T00:11:31Z) - Advising OpenMP Parallelization via a Graph-Based Approach with
Transformers [2.393682571484038]
We propose a novel approach, called OMPify, to detect and predict the OpenMP pragmas and shared-memory attributes in parallel code.
OMPify is based on a Transformer-based model that leverages a graph-based representation of source code.
Our results demonstrate that OMPify outperforms existing approaches, the general-purposed and popular ChatGPT and targeted PragFormer models.
arXiv Detail & Related papers (2023-05-16T16:56:10Z) - Hybrid Models for Learning to Branch [81.93868699246214]
We propose a new hybrid architecture for efficient branching on CPU machines.
The proposed architecture combines the expressive power of GNNs with computationally inexpensive multi-layer perceptrons (MLP) for branching.
arXiv Detail & Related papers (2020-06-26T21:03:45Z) - Kernel methods through the roof: handling billions of points efficiently [94.31450736250918]
Kernel methods provide an elegant and principled approach to nonparametric learning, but so far could hardly be used in large scale problems.
Recent advances have shown the benefits of a number of algorithmic ideas, for example combining optimization, numerical linear algebra and random projections.
Here, we push these efforts further to develop and test a solver that takes full advantage of GPU hardware.
arXiv Detail & Related papers (2020-06-18T08:16:25Z) - MPLP++: Fast, Parallel Dual Block-Coordinate Ascent for Dense Graphical
Models [96.1052289276254]
This work introduces a new MAP-solver, based on the popular Dual Block-Coordinate Ascent principle.
Surprisingly, by making a small change to the low-performing solver, we derive the new solver MPLP++ that significantly outperforms all existing solvers by a large margin.
arXiv Detail & Related papers (2020-04-16T16:20:53Z) - PolyScientist: Automatic Loop Transformations Combined with Microkernels
for Optimization of Deep Learning Primitives [55.79741270235602]
We develop a hybrid solution to the development of deep learning kernels.
We use the advanced polyhedral technology to automatically tune the outer loops for performance.
arXiv Detail & Related papers (2020-02-06T08:02:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.