Related papers: From Prompts to Performance: Evaluating LLMs for Task-based Parallel Code Generation

From Prompts to Performance: Evaluating LLMs for Task-based Parallel Code Generation

URL: http://arxiv.org/abs/2602.22240v1
Date: Tue, 24 Feb 2026 09:49:10 GMT
Title: From Prompts to Performance: Evaluating LLMs for Task-based Parallel Code Generation
Authors: Linus Bantel, Moritz Strack, Alexander Strack, Dirk Pflüger,
Abstract summary: Large Language Models show strong abilities in code generation, but their skill in creating efficient parallel programs is less studied.<n>This paper explores how LLMs generate task-based parallel code from three kinds of input prompts.<n>We focus on three programming frameworks: OpenMP Tasking, C++ standard parallelism, and the asynchronous many-task runtime HPX.
Score: 39.426381252597146
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLM) show strong abilities in code generation, but their skill in creating efficient parallel programs is less studied. This paper explores how LLMs generate task-based parallel code from three kinds of input prompts: natural language problem descriptions, sequential reference implementations, and parallel pseudo code. We focus on three programming frameworks: OpenMP Tasking, C++ standard parallelism, and the asynchronous many-task runtime HPX. Each framework offers different levels of abstraction and control for task execution. We evaluate LLM-generated solutions for correctness and scalability. Our results reveal both strengths and weaknesses of LLMs with regard to problem complexity and framework. Finally, we discuss what these findings mean for future LLM-assisted development in high-performance and scientific computing.

Related papers

Operational Robustness of LLMs on Code Generation [2.9232837969697965]
It is now common practice in software development for large language models (LLMs) to be used to generate program code.<n>This paper is concerned in particular with how sensitive LLMs are to variations in descriptions of the coding tasks.<n>Existing techniques for evaluating this robustness are unsuitable for code generation because the input data space of natural language descriptions is discrete.
arXiv Detail & Related papers (2026-02-21T11:21:13Z)
On the Effectiveness of LLM-as-a-judge for Code Generation and Summarization [54.965787768076254]
Large Language Models have been recently exploited as judges for complex natural language processing tasks, such as Q&A.<n>We study the effectiveness of LLMs-as-a-judge for two code-related tasks, namely code generation and code summarization.
arXiv Detail & Related papers (2025-07-22T13:40:26Z)
Pseudocode-Injection Magic: Enabling LLMs to Tackle Graph Computational Tasks [15.69049038121735]
Graph computational tasks are inherently challenging and often demand advanced algorithms for effective solutions.<n>Existing approaches are constrained by large language models' limited capability to comprehend complex graph structures.<n>We introduce a novel framework, PIE, which consists of three key steps: problem understanding, prompt design, and code generation.
arXiv Detail & Related papers (2025-01-23T15:04:22Z)
What's Wrong with Your Code Generated by Large Language Models? An Extensive Study [92.62952504133926]
This study evaluated the performance of three leading closed-source LLMs and six popular open-source LLMs on three commonly used benchmarks.<n>We developed a taxonomy of bugs for incorrect codes and analyzed the root cause for common bug types.<n>We propose a novel training-free iterative method that introduces self-critique, enabling LLMs to critique and correct their generated code.
arXiv Detail & Related papers (2024-07-08T17:27:17Z)
APPL: A Prompt Programming Language for Harmonious Integration of Programs and Large Language Model Prompts [21.819126948549766]
Large Language Models (LLMs) have become increasingly capable of handling diverse tasks with the aid of well-crafted prompts. APPL acts as a bridge between computer programs and LLMs, allowing seamless embedding of prompts into Python functions.
arXiv Detail & Related papers (2024-06-19T02:29:59Z)
PPTC-R benchmark: Towards Evaluating the Robustness of Large Language Models for PowerPoint Task Completion [96.47420221442397]
We construct adversarial user instructions by attacking user instructions at sentence, semantic, and multi-language levels. We test 3 closed-source and 4 open-source LLMs using a benchmark that incorporates robustness settings. We find that GPT-4 exhibits the highest performance and strong robustness in our benchmark.
arXiv Detail & Related papers (2024-03-06T15:33:32Z)
If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents [81.60906807941188]
Large language models (LLMs) are trained on a combination of natural language and formal language (code) Code translates high-level goals into executable steps, featuring standard syntax, logical consistency, abstraction, and modularity.
arXiv Detail & Related papers (2024-01-01T16:51:20Z)
An LLM Compiler for Parallel Function Calling [68.04566807806071]
We introduce LLMCompiler, which executes functions in parallel to efficiently orchestrate multiple function calls. We observe consistent latency speedup of up to 3.7x, cost savings of up to 6.7x, and accuracy improvement of up to 9% compared to ReAct.
arXiv Detail & Related papers (2023-12-07T18:32:04Z)
CodeScope: An Execution-based Multilingual Multitask Multidimensional Benchmark for Evaluating LLMs on Code Understanding and Generation [18.354576598908448]
Large Language Models (LLMs) have demonstrated remarkable performance on assisting humans in programming. Existing benchmarks for evaluating the code understanding and generation capacities of LLMs suffer from severe limitations. We introduce CodeScope, an execution-based, multilingual, multitask, multidimensional evaluation benchmark.
arXiv Detail & Related papers (2023-11-14T23:18:52Z)
Low-code LLM: Graphical User Interface over Large Language Models [115.08718239772107]
This paper introduces a novel human-LLM interaction framework, Low-code LLM. It incorporates six types of simple low-code visual programming interactions to achieve more controllable and stable responses. We highlight three advantages of the low-code LLM: user-friendly interaction, controllable generation, and wide applicability.
arXiv Detail & Related papers (2023-04-17T09:27:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.