Related papers: PerfCodeGen: Improving Performance of LLM Generated Code with Execution Feedback

PerfCodeGen: Improving Performance of LLM Generated Code with Execution Feedback

URL: http://arxiv.org/abs/2412.03578v1
Date: Mon, 18 Nov 2024 06:22:38 GMT
Title: PerfCodeGen: Improving Performance of LLM Generated Code with Execution Feedback
Authors: Yun Peng, Akhilesh Deepak Gotmare, Michael Lyu, Caiming Xiong, Silvio Savarese, Doyen Sahoo,
Abstract summary: Large Language Models (LLMs) are widely adopted for assisting in software development tasks.<n>We propose PerfCodeGen, a training-free framework that enhances the performance of LLM-generated code.
Score: 78.89596149768458
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Large Language Models (LLMs) are widely adopted for assisting in software development tasks, yet their performance evaluations have narrowly focused on the functional correctness of generated code. Human programmers, however, require LLM-generated code to be not only correct but also optimally efficient. We propose PerfCodeGen, a training-free framework that enhances the performance of LLM-generated code by incorporating feedback based on runtime during test case execution into the self-refinement iterations. With PerfCodeGen, we achieve speedups for a significantly higher proportion of problems compared to using the base LLM with sophisticated prompting techniques. Applied to open language models like Phi-3-mini, PerfCodeGen achieves runtime efficiency comparable to prompting powerful closed models like GPT-4. We achieve state-of-the-art runtime efficiency on benchmarks such as HumanEval, MBPP, and APPS, frequently surpassing the ground truth reference solutions with PerfCodeGen using GPT-3.5 and GPT-4. Additionally, we demonstrate the effectiveness of our approach in enhancing code quality across a range of open LLMs of varying sizes including Phi-3-mini, Llama 3 8B, Mixtral 8x7B, Command R, and Llama 3 70B.

Related papers

Evaluating and Achieving Controllable Code Completion in Code LLM [89.64782747840225]
We present the first instruction-guided code completion benchmark, Controllable Code Completion Benchmark (C3-Bench)<n>We reveal substantial gaps in instruction-following capabilities between open-source and advanced proprietary models during code completion tasks.<n>The resulting model, Qwen2.5-Coder-C3, achieves state-of-the-art performance on C3-Bench.
arXiv Detail & Related papers (2026-01-22T11:40:04Z)
FasterPy: An LLM-based Code Execution Efficiency Optimization Framework [11.766544835516974]
Code often suffers from performance bugs.<n>Traditional rule-based methods rely on manually designing and maintaining rules for specific performance bugs.<n>We propose FasterPy, a framework that adapts Large Language Models to optimize the execution efficiency of Python code.
arXiv Detail & Related papers (2025-12-28T07:43:08Z)
PerfCoder: Large Language Models for Interpretable Code Performance Optimization [15.79612555952707]
PerfCoder is a family of large language models (LLMs) designed to generate performance-enhanced code from source code.<n>PerfCoder is fine-tuned on a curated collection of real-world optimization trajectories with human-readable annotations.<n>PerfCoder surpasses all existing models in both runtime speedup and effective optimization rate.
arXiv Detail & Related papers (2025-12-16T02:30:04Z)
An Experimental Study of Real-Life LLM-Proposed Performance Improvements [2.503024366864326]
Large Language Models (LLMs) can generate code, but can they generate fast code?<n>We study this question using a dataset of 65 real-world tasks mined from open-source Java programs.
arXiv Detail & Related papers (2025-10-17T10:06:52Z)
Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization [46.33639431414019]
Large Language Models generate functionally correct solutions but often fall short in code efficiency.<n>We introduce a novel test-time iterative optimization framework to address this.
arXiv Detail & Related papers (2025-05-29T12:14:29Z)
Improving Assembly Code Performance with Large Language Models via Reinforcement Learning [9.20863636863631]
Large language models (LLMs) have demonstrated strong performance across a wide range of programming tasks.<n>We present a reinforcement learning framework that trains LLMs using Proximal Policy Optimization (PPO)<n>Our model, Qwen2.5-Coder-7B-PPO, achieves 96.4% test pass rates and an average speedup of 1.47x over the gcc -O3 baseline.
arXiv Detail & Related papers (2025-05-16T17:40:45Z)
Quantizing Large Language Models for Code Generation: A Differentiated Replication [51.85505914274633]
Large Language Models (LLMs) have shown an impressive capability in code generation and, specifically, to automatically implement requirements described in natural language. LLMs pose significant challenges related to their memory (and, consequently, carbon) footprint. New frontier for LLM quantization is 4-bit precision, resulting in an average memory footprint reduction of 70%.
arXiv Detail & Related papers (2025-03-10T09:26:08Z)
COFFE: A Code Efficiency Benchmark for Code Generation [20.79578698298569]
We propose COFFE, a code generation benchmark for evaluating the time efficiency of LLM-generated code solutions. COFFE contains 398 and 358 problems for function-level and file-level code generation, respectively. For the time evaluation metric, we propose efficienct@k based on CPU instruction count to ensure a stable and solid comparison between different solutions.
arXiv Detail & Related papers (2025-02-05T02:08:51Z)
Optimizing Code Runtime Performance through Context-Aware Retrieval-Augmented Generation [8.574686422653345]
Auto achieves a 7.3% improvement in execution efficiency over GPT-4o across common generated executable code. This study introduces an in-context learning approach designed to bridge the gap by enabling LLMs to automatically generate optimized code.
arXiv Detail & Related papers (2025-01-28T04:00:35Z)
Effi-Code: Unleashing Code Efficiency in Language Models [17.355845751737423]
Effi-Code is an approach to enhancing code generation in large language models. Effi-Code offers a scalable and generalizable approach to improving code generation in AI systems.
arXiv Detail & Related papers (2024-10-14T07:05:51Z)
CodeDPO: Aligning Code Models with Self Generated and Verified Source Code [52.70310361822519]
We propose CodeDPO, a framework that integrates preference learning into code generation to improve two key code preference factors: code correctness and efficiency. CodeDPO employs a novel dataset construction method, utilizing a self-generation-and-validation mechanism that simultaneously generates and evaluates code and test cases.
arXiv Detail & Related papers (2024-10-08T01:36:15Z)
Applying RLAIF for Code Generation with API-usage in Lightweight LLMs [15.366324461797582]
Reinforcement Learning from AI Feedback (RLAIF) has demonstrated significant potential across various domains. This paper introduces an RLAIF framework for improving the code generation abilities of lightweight (1B parameters) LLMs.
arXiv Detail & Related papers (2024-06-28T17:16:03Z)
How Efficient is LLM-Generated Code? A Rigorous & High-Standard Benchmark [39.13045037676502]
Development of large language models (LLMs) has significantly pushed the frontiers of program synthesis. Most evaluation frameworks focus on the (functional) correctness of generated code; efficiency, as an important measure of code quality, has been overlooked in existing evaluations. We develop ENAMEL, a rigorous and high-standard benchmark for evaluating the capability of LLMs in generating efficient code.
arXiv Detail & Related papers (2024-06-10T04:19:20Z)
On Evaluating the Efficiency of Source Code Generated by LLMs [31.8121544062256]
More efficient code can lead to higher performance and execution efficiency of programs and software completed by LLM-assisted programming. First, we evaluate the efficiency of the code generated by LLMs on two benchmarks, HumanEval and MBPP. Then, we choose a set of programming problems from the online judge platform LeetCode to conduct a more difficult evaluation.
arXiv Detail & Related papers (2024-04-09T05:59:39Z)
Exploring Data-Efficient Adaptation of Large Language Models for Code Generation [64.5583894165813]
We propose a novel adaptation approach named DEED, which stands for Data-Efficient adaptation with Error-Driven learning for code generation.<n> Experimental results show that, compared to other mainstream fine-tuning approaches, DEED achieves superior performance with few training data.
arXiv Detail & Related papers (2024-02-29T16:09:02Z)
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback [58.20547418182074]
We introduce StepCoder, a novel framework for code generation, consisting of two main components. CCCS addresses the exploration challenge by breaking the long sequences code generation task into a Curriculum of Code Completion Subtasks. FGO only optimize the model by masking the unexecuted code segments to provide Fine-Grained Optimization. Our method improves the ability to explore the output space and outperforms state-of-the-art approaches in corresponding benchmarks.
arXiv Detail & Related papers (2024-02-02T13:14:31Z)
LLM-Pruner: On the Structural Pruning of Large Language Models [65.02607075556742]
Large language models (LLMs) have shown remarkable capabilities in language understanding and generation. We tackle the compression of LLMs within the bound of two constraints: being task-agnostic and minimizing the reliance on the original training dataset. Our method, named LLM-Pruner, adopts structural pruning that selectively removes non-critical coupled structures.
arXiv Detail & Related papers (2023-05-19T12:10:53Z)
Learning Performance-Improving Code Edits [107.21538852090208]
We introduce a framework for adapting large language models (LLMs) to high-level program optimization. First, we curate a dataset of performance-improving edits made by human programmers of over 77,000 competitive C++ programming submission pairs. For prompting, we propose retrieval-based few-shot prompting and chain-of-thought, and for finetuning, these include performance-conditioned generation and synthetic data augmentation based on self-play.
arXiv Detail & Related papers (2023-02-15T18:59:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.