SysLLMatic: Large Language Models are Software System Optimizers
- URL: http://arxiv.org/abs/2506.01249v2
- Date: Fri, 10 Oct 2025 22:01:02 GMT
- Title: SysLLMatic: Large Language Models are Software System Optimizers
- Authors: Huiyun Peng, Arjun Gupte, Ryan Hasler, Nicholas John Eliopoulos, Chien-Chou Ho, Rishi Mantri, Leo Deng, Konstantin Läufer, George K. Thiruvathukal, James C. Davis,
- Abstract summary: We present SysLLMatic, a system that integrates LLMs with performance diagnostics feedback and a curated catalog of 43 optimization patterns.<n>We evaluate it on three benchmark suites: HumanEval_ CPP (competitive programming in C++), SciMark2 (scientific kernels in Java), and DaCapoBench (large-scale software systems in Java)<n>Results show that SysLLMatic can improve software system performance, including latency, throughput energy efficiency, memory usage, and CPU utilization.
- Score: 5.342249053033864
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatic software system optimization can improve software speed and save energy. Traditional approaches to optimization rely on manual tuning and compiler heuristics, limiting their ability to generalize across diverse codebases. Recent methods using LLMs introduce automation, but they do not scale effectively to the complexity and size of real-world software systems, leaving a gap in practical applicability. We present SysLLMatic, a system that integrates LLMs with performance diagnostics feedback and a curated catalog of 43 optimization patterns to automatically optimize software code. Our approach builds on recent advances in LLM-based code optimization and specifically targets the limitations of existing systems in handling real-world software applications. We evaluate it on three benchmark suites: HumanEval_CPP (competitive programming in C++), SciMark2 (scientific kernels in Java), and DaCapoBench (large-scale software systems in Java). Results show that SysLLMatic can improve software system performance, including latency, throughput, energy efficiency, memory usage, and CPU utilization. It consistently outperforms state-of-the-art LLM baselines on microbenchmarks. On large-scale application codes, to which prior LLM approaches have not scaled, it surpasses compiler optimizations, achieving average relative improvements of 1.5x in latency (vs. 1.01x for the compiler) and 1.76x in throughput (vs. 1.02x for the compiler). Our findings demonstrate that LLMs, guided by principled system thinking through the optimization pattern catalog and appropriate performance diagnostics, can serve as viable software system optimizers. We further identify limitations of our approach and the challenges involved in handling complex applications. This work provides a foundation for generating optimized code across various languages, benchmarks, and program sizes in a principled manner.
Related papers
- PerfCoder: Large Language Models for Interpretable Code Performance Optimization [15.79612555952707]
PerfCoder is a family of large language models (LLMs) designed to generate performance-enhanced code from source code.<n>PerfCoder is fine-tuned on a curated collection of real-world optimization trajectories with human-readable annotations.<n>PerfCoder surpasses all existing models in both runtime speedup and effective optimization rate.
arXiv Detail & Related papers (2025-12-16T02:30:04Z) - STARK: Strategic Team of Agents for Refining Kernels [23.717055490630596]
We introduce an agentic framework for GPU kernel optimization that explores the design space through multi-agent collaboration.<n>This framework mimics the workflow of expert engineers, enabling LLMs to reason about hardware trade-offs, incorporate profiling feedback, and refine kernels iteratively.<n>We evaluate our approach on KernelBench, a benchmark for LLM-based kernel optimization, and demonstrate substantial improvements over baseline agents.
arXiv Detail & Related papers (2025-10-19T20:41:46Z) - Tutoring LLM into a Better CUDA Optimizer [0.0]
We focus on the capabilities of the most recent reasoning models to generate optimized code for predefined, well-known tasks.<n>Our objective is to determine which types of code optimizations and parallel patterns the LLMs can perform by themselves and whether they can be improved by tutoring.
arXiv Detail & Related papers (2025-10-19T17:09:15Z) - SemOpt: LLM-Driven Code Optimization via Rule-Based Analysis [7.451702651736553]
SemOpt is a framework that leverages static program analysis to precisely identify optimizable code segments, retrieve the corresponding optimization strategies, and generate the optimized results.<n>On our benchmark containing 151 optimization tasks, SemOpt demonstrates its effectiveness under different LLMs by increasing the number of successful optimizations by 1.38 to 28 times compared to the baseline.
arXiv Detail & Related papers (2025-10-18T07:44:18Z) - xLLM Technical Report [57.13120905321185]
We introduce xLLM, an intelligent and efficient Large Language Model (LLM) inference framework.<n>xLLM builds a novel decoupled service-engine architecture.<n>xLLM-Engine co-optimizes system and algorithm designs to fully saturate computing resources.
arXiv Detail & Related papers (2025-10-16T13:53:47Z) - Do Large Language Models Understand Performance Optimization? [0.9320657506524149]
Large Language Models (LLMs) have emerged as powerful tools for software development tasks such as code completion, translation, and optimization.<n>This paper presents a benchmark suite encompassing multiple critical HPC computational motifs to evaluate the performance of code optimized by state-of-the-art LLMs.
arXiv Detail & Related papers (2025-03-17T23:30:23Z) - Autellix: An Efficient Serving Engine for LLM Agents as General Programs [59.673243129044465]
Large language model (LLM) applications are evolving beyond simple chatbots into dynamic, general-purpose agentic programs.<n>Existing LLM serving systems ignore dependencies between programs and calls, missing significant opportunities for optimization.<n>We introduce Autellix, an LLM serving system that treats programs as first-class citizens to minimize their end-to-end latencies.
arXiv Detail & Related papers (2025-02-19T18:59:30Z) - Tackling the Dynamicity in a Production LLM Serving System with SOTA Optimizations via Hybrid Prefill/Decode/Verify Scheduling on Efficient Meta-kernels [12.77187564450236]
We introduce XY-Serve, a versatile, Ascend native, end-to-end production large language model (LLM) serving system.<n>The core idea is an abstraction mechanism that smooths out the workload variability by decomposing computations into fine-grained meta primitives.<n>For GEMM, we introduce a virtual padding scheme that adapts to dynamic shape changes while using highly efficient GEMM primitives with assorted fixed tile sizes.
arXiv Detail & Related papers (2024-12-24T02:27:44Z) - PerfCodeGen: Improving Performance of LLM Generated Code with Execution Feedback [78.89596149768458]
Large Language Models (LLMs) are widely adopted for assisting in software development tasks.<n>We propose PerfCodeGen, a training-free framework that enhances the performance of LLM-generated code.
arXiv Detail & Related papers (2024-11-18T06:22:38Z) - Training of Scaffolded Language Models with Language Supervision: A Survey [62.59629932720519]
This survey organizes the literature on the design and optimization of emerging structures around post-trained LMs.<n>We refer to this overarching structure as scaffolded LMs and focus on LMs that are integrated into multi-step processes with tools.
arXiv Detail & Related papers (2024-10-21T18:06:25Z) - Improving Parallel Program Performance with LLM Optimizers via Agent-System Interfaces [9.880183350366792]
A key challenge in improving parallel program performance is efficiently mapping tasks to processors and data to memory.<n>We introduce a framework that automates mapper development with generative optimization.<n>Our approach finds mappers that surpass expert-written mappers by up to 1.34X speedup across nine benchmarks.
arXiv Detail & Related papers (2024-10-21T04:08:37Z) - Should AI Optimize Your Code? A Comparative Study of Classical Optimizing Compilers Versus Current Large Language Models [0.0]
Large Language Models (LLMs) raise intriguing questions about the potential of these AI approaches to revolutionize code optimization.<n>This work aims to answer an essential question for the compiler community: "Can AI-driven models revolutionize the way we approach code optimization?"<n>We present a comparative analysis between three classical optimizing compilers and two recent large language models.
arXiv Detail & Related papers (2024-06-17T23:26:41Z) - A Problem-Oriented Perspective and Anchor Verification for Code Optimization [43.28045750932116]
Large language models (LLMs) have shown remarkable capabilities in solving various programming tasks.<n>This paper investigates the capabilities of LLMs in optimizing code for minimal execution time.
arXiv Detail & Related papers (2024-06-17T16:10:10Z) - LLM as a Complementary Optimizer to Gradient Descent: A Case Study in Prompt Tuning [69.95292905263393]
We show that gradient-based and high-level LLMs can effectively collaborate a combined optimization framework.<n>In this paper, we show that these complementary to each other and can effectively collaborate a combined optimization framework.
arXiv Detail & Related papers (2024-05-30T06:24:14Z) - CompilerDream: Learning a Compiler World Model for General Code Optimization [58.87557583347996]
We introduce CompilerDream, a model-based reinforcement learning approach to general code optimization.
It comprises a compiler world model that accurately simulates the intrinsic properties of optimization passes and an agent trained on this model to produce effective optimization strategies.
It excels across diverse datasets, surpassing LLVM's built-in optimizations and other state-of-the-art methods in both settings of value prediction and end-to-end code optimization.
arXiv Detail & Related papers (2024-04-24T09:20:33Z) - Dissecting the Runtime Performance of the Training, Fine-tuning, and
Inference of Large Language Models [26.2566707495948]
Large Language Models (LLMs) have seen great advance in both academia and industry.
We benchmark the end-to-end performance of pre-training, fine-tuning, and serving LLMs in different sizes.
Then, we dive deeper to provide a detailed runtime analysis of the sub-modules, including computing and communication operators in LLMs.
arXiv Detail & Related papers (2023-11-07T03:25:56Z) - Large Language Models as Optimizers [106.52386531624532]
We propose Optimization by PROmpting (OPRO), a simple and effective approach to leverage large language models (LLMs) as prompts.
In each optimization step, the LLM generates new solutions from the prompt that contains previously generated solutions with their values.
We demonstrate that the best prompts optimized by OPRO outperform human-designed prompts by up to 8% on GSM8K, and by up to 50% on Big-Bench Hard tasks.
arXiv Detail & Related papers (2023-09-07T00:07:15Z) - Learning Performance-Improving Code Edits [107.21538852090208]
We introduce a framework for adapting large language models (LLMs) to high-level program optimization.
First, we curate a dataset of performance-improving edits made by human programmers of over 77,000 competitive C++ programming submission pairs.
For prompting, we propose retrieval-based few-shot prompting and chain-of-thought, and for finetuning, these include performance-conditioned generation and synthetic data augmentation based on self-play.
arXiv Detail & Related papers (2023-02-15T18:59:21Z) - Learning to Superoptimize Real-world Programs [79.4140991035247]
We propose a framework to learn to superoptimize real-world programs by using neural sequence-to-sequence models.
We introduce the Big Assembly benchmark, a dataset consisting of over 25K real-world functions mined from open-source projects in x86-64 assembly.
arXiv Detail & Related papers (2021-09-28T05:33:21Z) - Enabling Retargetable Optimizing Compilers for Quantum Accelerators via
a Multi-Level Intermediate Representation [78.8942067357231]
We present a multi-level quantum-classical intermediate representation (IR) that enables an optimizing, retargetable, ahead-of-time compiler.
We support the entire gate-based OpenQASM 3 language and provide custom extensions for common quantum programming patterns and improved syntax.
Our work results in compile times that are 1000x faster than standard Pythonic approaches, and 5-10x faster than comparative standalone quantum language compilers.
arXiv Detail & Related papers (2021-09-01T17:29:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.