Related papers: Improving Assembly Code Performance with Large Language Models via Reinforcement Learning

Improving Assembly Code Performance with Large Language Models via Reinforcement Learning

URL: http://arxiv.org/abs/2505.11480v1
Date: Fri, 16 May 2025 17:40:45 GMT
Title: Improving Assembly Code Performance with Large Language Models via Reinforcement Learning
Authors: Anjiang Wei, Tarun Suresh, Huanmi Tan, Yinglun Xu, Gagandeep Singh, Ke Wang, Alex Aiken,
Abstract summary: Large language models (LLMs) have demonstrated strong performance across a wide range of programming tasks.<n>We present a reinforcement learning framework that trains LLMs using Proximal Policy Optimization (PPO)<n>Our model, Qwen2.5-Coder-7B-PPO, achieves 96.4% test pass rates and an average speedup of 1.47x over the gcc -O3 baseline.
Score: 9.20863636863631
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) have demonstrated strong performance across a wide range of programming tasks, yet their potential for code optimization remains underexplored. This work investigates whether LLMs can optimize the performance of assembly code, where fine-grained control over execution enables improvements that are difficult to express in high-level languages. We present a reinforcement learning framework that trains LLMs using Proximal Policy Optimization (PPO), guided by a reward function that considers both functional correctness, validated through test cases, and execution performance relative to the industry-standard compiler gcc -O3. To support this study, we introduce a benchmark of 8,072 real-world programs. Our model, Qwen2.5-Coder-7B-PPO, achieves 96.0% test pass rates and an average speedup of 1.47x over the gcc -O3 baseline, outperforming all 20 other models evaluated, including Claude-3.7-sonnet. These results indicate that reinforcement learning can unlock the potential of LLMs to serve as effective optimizers for assembly code performance.

Related papers

dInfer: An Efficient Inference Framework for Diffusion Language Models [54.80918957287927]
Diffusion-based large language models (dLLMs) have emerged as a promising alternative to autoregressive (AR) LLMs.<n>We present dInfer, an efficient and framework for dLLM inference.
arXiv Detail & Related papers (2025-10-09T16:19:42Z)
Teaching LLM to Reason: Reinforcement Learning from Algorithmic Problems without Code [76.80306464249217]
We propose TeaR, which aims at teaching LLMs to reason better.<n>TeaR leverages careful data curation and reinforcement learning to guide models in discovering optimal reasoning paths through code-related tasks.<n>We conduct extensive experiments using two base models and three long-CoT distillation models, with model sizes ranging from 1.5 billion to 32 billion parameters, and across 17 benchmarks spanning Math, Knowledge, Code, and Logical Reasoning.
arXiv Detail & Related papers (2025-07-10T07:34:05Z)
SysLLMatic: Large Language Models are Software System Optimizers [5.342249053033864]
We present SysLLMatic, a system that integrates LLMs with performance diagnostics feedback and a curated catalog of 43 optimization patterns.<n>We evaluate it on three benchmark suites: HumanEval_ CPP (competitive programming in C++), SciMark2 (scientific kernels in Java), and DaCapoBench (large-scale software systems in Java)<n>Results show that SysLLMatic can improve software system performance, including latency, throughput energy efficiency, memory usage, and CPU utilization.
arXiv Detail & Related papers (2025-06-02T01:57:21Z)
Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization [46.33639431414019]
Large Language Models generate functionally correct solutions but often fall short in code efficiency.<n>We introduce a novel test-time iterative optimization framework to address this.
arXiv Detail & Related papers (2025-05-29T12:14:29Z)
EquiBench: Benchmarking Large Language Models' Reasoning about Program Semantics via Equivalence Checking [58.15568681219339]
We introduce EquiBench, a new benchmark for evaluating large language models (LLMs)<n>This task directly tests a model's ability to reason about program semantics.<n>We evaluate 19 state-of-the-art LLMs and find that in the most challenging categories, the best accuracies are 63.8% and 76.2%, only modestly above the 50% random baseline.
arXiv Detail & Related papers (2025-02-18T02:54:25Z)
LLM2: Let Large Language Models Harness System 2 Reasoning [65.89293674479907]
Large language models (LLMs) have exhibited impressive capabilities across a myriad of tasks, yet they occasionally yield undesirable outputs.<n>We introduce LLM2, a novel framework that combines an LLM with a process-based verifier.<n>LLMs2 is responsible for generating plausible candidates, while the verifier provides timely process-based feedback to distinguish desirable and undesirable outputs.
arXiv Detail & Related papers (2024-12-29T06:32:36Z)
PerfCodeGen: Improving Performance of LLM Generated Code with Execution Feedback [78.89596149768458]
Large Language Models (LLMs) are widely adopted for assisting in software development tasks.<n>We propose PerfCodeGen, a training-free framework that enhances the performance of LLM-generated code.
arXiv Detail & Related papers (2024-11-18T06:22:38Z)
Large Language Models as Code Executors: An Exploratory Study [29.545321608864295]
This paper pioneers the exploration of Large Language Models (LLMs) as code executors. We are the first to examine this feasibility across various LLMs, including OpenAI's o1, GPT-4o, GPT-3.5, DeepSeek, and Qwen-Coder. We introduce an Iterative Instruction Prompting (IIP) technique that processes code snippets line by line, enhancing the accuracy of weaker models by an average of 7.22%.
arXiv Detail & Related papers (2024-10-09T08:23:22Z)
Optimization of Armv9 architecture general large language model inference performance based on Llama.cpp [0.3749861135832073]
This article optimize the inference performance of the Qwen-1.8B model by performing Int8 quantization, vectorizing some operators in llama, and modifying the compilation script. On the Yitian 710 experimental platform, the prefill performance is increased by 1.6 times, the decoding performance is increased by 24 times, the memory usage is reduced to 1/5 of the original, and the accuracy loss is almost negligible.
arXiv Detail & Related papers (2024-06-16T06:46:25Z)
Performance-Aligned LLMs for Generating Fast Code [2.180216161965907]
We introduce a reinforcement learning based methodology to align the outputs of code LLMs with performance. We demonstrate that our fine-tuned model improves the expected speedup of generated code over base models for a set of benchmark tasks.
arXiv Detail & Related papers (2024-04-29T16:52:38Z)
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback [58.20547418182074]
We introduce StepCoder, a novel framework for code generation, consisting of two main components. CCCS addresses the exploration challenge by breaking the long sequences code generation task into a Curriculum of Code Completion Subtasks. FGO only optimize the model by masking the unexecuted code segments to provide Fine-Grained Optimization. Our method improves the ability to explore the output space and outperforms state-of-the-art approaches in corresponding benchmarks.
arXiv Detail & Related papers (2024-02-02T13:14:31Z)
Dissecting the Runtime Performance of the Training, Fine-tuning, and Inference of Large Language Models [26.2566707495948]
Large Language Models (LLMs) have seen great advance in both academia and industry. We benchmark the end-to-end performance of pre-training, fine-tuning, and serving LLMs in different sizes. Then, we dive deeper to provide a detailed runtime analysis of the sub-modules, including computing and communication operators in LLMs.
arXiv Detail & Related papers (2023-11-07T03:25:56Z)
Large Language Models for Compiler Optimization [22.52765975286403]
We present a transformer model trained from scratch to optimize LLVM assembly for code size. We ask the model to predict the instruction counts before and after optimization, and the optimized code itself. Our approach achieves a 3.0% improvement in reducing instruction counts over the compiler.
arXiv Detail & Related papers (2023-09-11T22:11:46Z)
Learning Performance-Improving Code Edits [107.21538852090208]
We introduce a framework for adapting large language models (LLMs) to high-level program optimization. First, we curate a dataset of performance-improving edits made by human programmers of over 77,000 competitive C++ programming submission pairs. For prompting, we propose retrieval-based few-shot prompting and chain-of-thought, and for finetuning, these include performance-conditioned generation and synthetic data augmentation based on self-play.
arXiv Detail & Related papers (2023-02-15T18:59:21Z)
Learning to Superoptimize Real-world Programs [79.4140991035247]
We propose a framework to learn to superoptimize real-world programs by using neural sequence-to-sequence models. We introduce the Big Assembly benchmark, a dataset consisting of over 25K real-world functions mined from open-source projects in x86-64 assembly.
arXiv Detail & Related papers (2021-09-28T05:33:21Z)
Enabling Retargetable Optimizing Compilers for Quantum Accelerators via a Multi-Level Intermediate Representation [78.8942067357231]
We present a multi-level quantum-classical intermediate representation (IR) that enables an optimizing, retargetable, ahead-of-time compiler. We support the entire gate-based OpenQASM 3 language and provide custom extensions for common quantum programming patterns and improved syntax. Our work results in compile times that are 1000x faster than standard Pythonic approaches, and 5-10x faster than comparative standalone quantum language compilers.
arXiv Detail & Related papers (2021-09-01T17:29:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.