Related papers: Comparing Llama-2 and GPT-3 LLMs for HPC kernels generation

Comparing Llama-2 and GPT-3 LLMs for HPC kernels generation

URL: http://arxiv.org/abs/2309.07103v1
Date: Tue, 12 Sep 2023 01:19:54 GMT
Title: Comparing Llama-2 and GPT-3 LLMs for HPC kernels generation
Authors: Pedro Valero-Lara, Alexis Huante, Mustafa Al Lail, William F. Godoy, Keita Teranishi, Prasanna Balaprakash, Jeffrey S. Vetter
Abstract summary: We evaluate the use of the open-source Llama-2 model for generating high-performance computing kernels. Our goal is to compare the accuracy of Llama-2 and our original GPT-3 baseline by using a similar metric.
Score: 3.070523453466106
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We evaluate the use of the open-source Llama-2 model for generating well-known, high-performance computing kernels (e.g., AXPY, GEMV, GEMM) on different parallel programming models and languages (e.g., C++: OpenMP, OpenMP Offload, OpenACC, CUDA, HIP; Fortran: OpenMP, OpenMP Offload, OpenACC; Python: numpy, Numba, pyCUDA, cuPy; and Julia: Threads, CUDA.jl, AMDGPU.jl). We built upon our previous work that is based on the OpenAI Codex, which is a descendant of GPT-3, to generate similar kernels with simple prompts via GitHub Copilot. Our goal is to compare the accuracy of Llama-2 and our original GPT-3 baseline by using a similar metric. Llama-2 has a simplified model that shows competitive or even superior accuracy. We also report on the differences between these foundational large language models as generative AI continues to redefine human-computer interactions. Overall, Copilot generates codes that are more reliable but less optimized, whereas codes generated by Llama-2 are less reliable but more optimized when correct.

Related papers

Evaluating Efficiency and Novelty of LLM-Generated Code for Graph Analysis [0.40964539027092917]
We present the first systematic study of Large Language Models' ability to generate efficient C implementations of graph-analysis routines.<n>Eight state-of-the-art models are benchmarked.<n>Results show that Claude Sonnet 4 Extended achieves the best result in the case of ready-to-use code generation and efficiency.
arXiv Detail & Related papers (2025-07-09T00:46:30Z)
Performance Evaluation of General Purpose Large Language Models for Basic Linear Algebra Subprograms Code Generation [0.0]
We evaluate the capability of existing general LLMs for BLAS code generation for CPUs.<n>We found that correct code can be generated in many cases even when only routine name are given.<n>We also confirmed that thread parallelization with OpenMP, SIMD vectorization, and cache blocking can be implemented to some extent.
arXiv Detail & Related papers (2025-07-07T06:33:59Z)
Second-order Optimization of Gaussian Splats with Importance Sampling [51.95046424364725]
3D Gaussian Splatting (3DGS) is widely used for novel view rendering due to its high quality and fast inference time. We propose a novel second-order optimization strategy based on Levenberg-Marquardt (LM) and Conjugate Gradient (CG) Our method achieves a $3times$ speedup over standard LM and outperforms Adam by $6times$ when the Gaussian count is low.
arXiv Detail & Related papers (2025-04-17T12:52:08Z)
Ramp Up NTT in Record Time using GPU-Accelerated Algorithms and LLM-based Code Generation [11.120838175165986]
Homomorphic encryption (HE) is a core building block in privacy-preserving machine learning (PPML) Many GPU-accelerated cryptographic schemes have been proposed to improve the performance of HE. Given the powerful code generation capabilities of large language models (LLMs), we aim to explore their potential to automatically generate practical GPU-friendly algorithm code.
arXiv Detail & Related papers (2025-02-16T12:53:23Z)
Evaluating AI-generated code for C++, Fortran, Go, Java, Julia, Matlab, Python, R, and Rust [0.1906498126334485]
This study evaluates the capabilities of ChatGPT versions 3.5 and 4 in generating code across a diverse range of programming languages. We asked ChatGPT to generate three distinct codes: a simple numerical integration, a conjugate gradient solver, and a parallel 1D stencil-based heat equation solver. The focus of our analysis was on the compilation, runtime performance, and accuracy of the codes.
arXiv Detail & Related papers (2024-05-21T17:04:37Z)
MPIrigen: MPI Code Generation through Domain-Specific Language Models [3.5352856644774806]
This study first investigates the performance of state-of-the-art language models in generating MPI-based parallel programs. We introduce a dedicated downstream task of MPI-based program generation by fine-tuning MonoCoder on HPCorpusMPI. The success of this tailored solution underscores the importance of domain-specific fine-tuning in optimizing language models for parallel computing code generation.
arXiv Detail & Related papers (2024-02-14T12:24:21Z)
Linear-time Minimum Bayes Risk Decoding with Reference Aggregation [52.1701152610258]
Minimum Bayes Risk (MBR) decoding is a text generation technique that has been shown to improve the quality of machine translations. It requires the pairwise calculation of a utility metric, which has quadratic complexity. We propose to approximate pairwise metric scores with scores calculated against aggregated reference representations.
arXiv Detail & Related papers (2024-02-06T18:59:30Z)
INR-Arch: A Dataflow Architecture and Compiler for Arbitrary-Order Gradient Computations in Implicit Neural Representation Processing [66.00729477511219]
Given a function represented as a computation graph, traditional architectures face challenges in efficiently computing its nth-order gradient. We introduce INR-Arch, a framework that transforms the computation graph of an nth-order gradient into a hardware-optimized dataflow architecture. We present results that demonstrate 1.8-4.8x and 1.5-3.6x speedup compared to CPU and GPU baselines respectively.
arXiv Detail & Related papers (2023-08-11T04:24:39Z)
Creating a Dataset for High-Performance Computing Code Translation using LLMs: A Bridge Between OpenMP Fortran and C++ [7.872005563259838]
The effectiveness of our dataset is assessed using both quantitative (CodeBLEU) and qualitative (human evaluation) methods. Models without prior coding knowledge experienced a boost of $mathbftimes5.1$ in CodeBLEU scores. Models with some coding familiarity saw an impressive $mathbftimes9.9$-fold increase.
arXiv Detail & Related papers (2023-07-15T02:35:51Z)
Evaluation of OpenAI Codex for HPC Parallel Programming Models Kernel Generation [1.7646846505225735]
We evaluate AI-assisted generative capabilities on fundamental numerical kernels in high-performance computing. We test the generated kernel codes for a variety of language-supported programming models. We propose a proficiency metric around the initial 10 suggestions given for each prompt.
arXiv Detail & Related papers (2023-06-27T00:11:31Z)
Advising OpenMP Parallelization via a Graph-Based Approach with Transformers [2.393682571484038]
We propose a novel approach, called OMPify, to detect and predict the OpenMP pragmas and shared-memory attributes in parallel code. OMPify is based on a Transformer-based model that leverages a graph-based representation of source code. Our results demonstrate that OMPify outperforms existing approaches, the general-purposed and popular ChatGPT and targeted PragFormer models.
arXiv Detail & Related papers (2023-05-16T16:56:10Z)
HDCC: A Hyperdimensional Computing compiler for classification on embedded systems and high-performance computing [58.720142291102135]
This work introduces the name compiler, the first open-source compiler that translates high-level descriptions of HDC classification methods into optimized C code. name is designed like a modern compiler, featuring an intuitive and descriptive input language, an intermediate representation (IR), and a retargetable backend. To substantiate these claims, we conducted experiments with HDCC on several of the most popular datasets in the HDC literature.
arXiv Detail & Related papers (2023-04-24T19:16:03Z)
Hybrid Models for Learning to Branch [81.93868699246214]
We propose a new hybrid architecture for efficient branching on CPU machines. The proposed architecture combines the expressive power of GNNs with computationally inexpensive multi-layer perceptrons (MLP) for branching.
arXiv Detail & Related papers (2020-06-26T21:03:45Z)
Kernel methods through the roof: handling billions of points efficiently [94.31450736250918]
Kernel methods provide an elegant and principled approach to nonparametric learning, but so far could hardly be used in large scale problems. Recent advances have shown the benefits of a number of algorithmic ideas, for example combining optimization, numerical linear algebra and random projections. Here, we push these efforts further to develop and test a solver that takes full advantage of GPU hardware.
arXiv Detail & Related papers (2020-06-18T08:16:25Z)
MPLP++: Fast, Parallel Dual Block-Coordinate Ascent for Dense Graphical Models [96.1052289276254]
This work introduces a new MAP-solver, based on the popular Dual Block-Coordinate Ascent principle. Surprisingly, by making a small change to the low-performing solver, we derive the new solver MPLP++ that significantly outperforms all existing solvers by a large margin.
arXiv Detail & Related papers (2020-04-16T16:20:53Z)
PolyScientist: Automatic Loop Transformations Combined with Microkernels for Optimization of Deep Learning Primitives [55.79741270235602]
We develop a hybrid solution to the development of deep learning kernels. We use the advanced polyhedral technology to automatically tune the outer loops for performance.
arXiv Detail & Related papers (2020-02-06T08:02:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.