Related papers: LLM & HPC:Benchmarking DeepSeek's Performance in High-Performance Computing Tasks

LLM & HPC:Benchmarking DeepSeek's Performance in High-Performance Computing Tasks

URL: http://arxiv.org/abs/2504.03665v2
Date: Mon, 28 Apr 2025 20:31:31 GMT
Title: LLM & HPC:Benchmarking DeepSeek's Performance in High-Performance Computing Tasks
Authors: Noujoud Nader, Patrick Diehl, Steve Brandt, Hartmut Kaiser,
Abstract summary: Large Language Models (LLMs) have been applied to a wide range of domains in software engineering.<n>This paper evaluates how well DeepSeek, a recent LLM, performs in generating a set of HPC benchmark codes.
Score: 0.1906498126334485
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Large Language Models (LLMs), such as GPT-4 and DeepSeek, have been applied to a wide range of domains in software engineering. However, their potential in the context of High-Performance Computing (HPC) much remains to be explored. This paper evaluates how well DeepSeek, a recent LLM, performs in generating a set of HPC benchmark codes: a conjugate gradient solver, the parallel heat equation, parallel matrix multiplication, DGEMM, and the STREAM triad operation. We analyze DeepSeek's code generation capabilities for traditional HPC languages like Cpp, Fortran, Julia and Python. The evaluation includes testing for code correctness, performance, and scaling across different configurations and matrix sizes. We also provide a detailed comparison between DeepSeek and another widely used tool: GPT-4. Our results demonstrate that while DeepSeek generates functional code for HPC tasks, it lags behind GPT-4, in terms of scalability and execution efficiency of the generated code.

Related papers

Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning [57.09163579304332]
We introduce PaperCoder, a framework that transforms machine learning papers into functional code repositories. PaperCoder operates in three stages: planning, designs the system architecture with diagrams, identifies file dependencies, and generates configuration files. We then evaluate PaperCoder on generating code implementations from machine learning papers based on both model-based and human evaluations.
arXiv Detail & Related papers (2025-04-24T01:57:01Z)
Do Large Language Models Understand Performance Optimization? [0.9320657506524149]
Large Language Models (LLMs) have emerged as powerful tools for software development tasks such as code completion, translation, and optimization.<n>This paper presents a benchmark suite encompassing multiple critical HPC computational motifs to evaluate the performance of code optimized by state-of-the-art LLMs.
arXiv Detail & Related papers (2025-03-17T23:30:23Z)
PerfCodeGen: Improving Performance of LLM Generated Code with Execution Feedback [78.89596149768458]
Large Language Models (LLMs) are widely adopted for assisting in software development tasks. We propose PerfCodeGen, a training-free framework that enhances the performance of LLM-generated code.
arXiv Detail & Related papers (2024-11-18T06:22:38Z)
LLM-Vectorizer: LLM-based Verified Loop Vectorizer [12.048697450464935]
Large-language models (LLMs) can generate vectorized code from scalar programs that process individual array elements. LLMs are capable of producing high performance vectorized code with run-time speedup ranging from 1.1x to 9.4x. Our approach is able to verify 38.2% of vectorizations as correct on the TSVC benchmark dataset.
arXiv Detail & Related papers (2024-06-07T07:04:26Z)
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback [58.20547418182074]
We introduce StepCoder, a novel framework for code generation, consisting of two main components. CCCS addresses the exploration challenge by breaking the long sequences code generation task into a Curriculum of Code Completion Subtasks. FGO only optimize the model by masking the unexecuted code segments to provide Fine-Grained Optimization. Our method improves the ability to explore the output space and outperforms state-of-the-art approaches in corresponding benchmarks.
arXiv Detail & Related papers (2024-02-02T13:14:31Z)
MonoCoder: Domain-Specific Code Language Model for HPC Codes and Tasks [5.125171374181664]
A growing trend in AI for software development is to develop large language models (LLMs) to address a variety of programming tasks. Even LLMs applied to tasks from the high-performance computing ( HPC) domain are huge in size and demand expensive compute resources for training. This is partly because LLMs for HPC tasks are obtained by finetuning existing LLMs that support several natural and/or programming languages. We build an HPC-specific LM, named MonoCoder, which is orders of magnitude smaller than existing LMs but delivers better performance on non- HPC and HPC codes.
arXiv Detail & Related papers (2023-12-20T15:11:06Z)
ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code [76.84199699772903]
ML-Bench is a benchmark rooted in real-world programming applications that leverage existing code repositories to perform tasks. To evaluate both Large Language Models (LLMs) and AI agents, two setups are employed: ML-LLM-Bench for assessing LLMs' text-to-code conversion within a predefined deployment environment, and ML-Agent-Bench for testing autonomous agents in an end-to-end task execution within a Linux sandbox environment.
arXiv Detail & Related papers (2023-11-16T12:03:21Z)
QUIK: Towards End-to-End 4-Bit Inference on Generative Large Language Models [57.04178959678024]
We show that the majority of inference computations for large generative models can be performed with both weights and activations being cast to 4 bits. We achieve this via a hybrid quantization strategy called QUIK, which compresses most of the weights and activations to 4-bit. We provide GPU kernels matching the QUIK format with highly-efficient layer-wise runtimes, which lead to practical end-to-end throughput improvements of up to 3.4x.
arXiv Detail & Related papers (2023-10-13T17:15:05Z)
Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data? [49.688233418425995]
Struc-Bench is a comprehensive benchmark featuring prominent Large Language Models (LLMs) We propose two innovative metrics, P-Score (Prompting Score) and H-Score (Heuristical Score) Our experiments show that applying our structure-aware fine-tuning to LLaMA-7B leads to substantial performance gains.
arXiv Detail & Related papers (2023-09-16T11:31:58Z)
Comparing Llama-2 and GPT-3 LLMs for HPC kernels generation [3.070523453466106]
We evaluate the use of the open-source Llama-2 model for generating high-performance computing kernels. Our goal is to compare the accuracy of Llama-2 and our original GPT-3 baseline by using a similar metric.
arXiv Detail & Related papers (2023-09-12T01:19:54Z)
Creating a Dataset for High-Performance Computing Code Translation using LLMs: A Bridge Between OpenMP Fortran and C++ [7.872005563259838]
The effectiveness of our dataset is assessed using both quantitative (CodeBLEU) and qualitative (human evaluation) methods. Models without prior coding knowledge experienced a boost of $mathbftimes5.1$ in CodeBLEU scores. Models with some coding familiarity saw an impressive $mathbftimes9.9$-fold increase.
arXiv Detail & Related papers (2023-07-15T02:35:51Z)
LM4HPC: Towards Effective Language Model Application in High-Performance Computing [0.46180371154032884]
We design the LM4 HPC framework to facilitate the research and development of HPC software analyses and optimizations using LMs. Our framework is built on top of a range of components from different levels of the machine learning software stack, with Hugging Face-compatible APIs. The results show that LM4 HPC can help users quickly evaluate a set of state-of-the-art models and generate insightful leaderboards.
arXiv Detail & Related papers (2023-06-26T18:05:03Z)
Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures [67.47328776279204]
This work introduces a framework to develop efficient, portable Deep Learning and High Performance Computing kernels. We decompose the kernel development in two steps: 1) Expressing the computational core using Processing Primitives (TPPs) and 2) Expressing the logical loops around TPPs in a high-level, declarative fashion. We demonstrate the efficacy of our approach using standalone kernels and end-to-end workloads that outperform state-of-the-art implementations on diverse CPU platforms.
arXiv Detail & Related papers (2023-04-25T05:04:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.