Leveraging LLMs to Automate Energy-Aware Refactoring of Parallel Scientific Codes
- URL: http://arxiv.org/abs/2505.02184v2
- Date: Wed, 05 Nov 2025 03:55:53 GMT
- Title: Leveraging LLMs to Automate Energy-Aware Refactoring of Parallel Scientific Codes
- Authors: Matthew T. Dearing, Yiheng Tao, Xingfu Wu, Zhiling Lan, Valerie Taylor,
- Abstract summary: Large language models (LLMs) are increasingly used for generating parallel scientific codes.<n>We propose LASSI-EE, an automated screening framework that generates energy-efficient parallel codes.<n>We introduce energy-reduction@k, a novel metric that quantifies expected energy reduction when generating k code candidates.
- Score: 1.2178992475191555
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: While large language models (LLMs) are increasingly used for generating parallel scientific codes, most efforts emphasize functional correctness, often overlooking performance, especially energy efficiency. We propose LASSI-EE, an automated LLM-based refactoring framework that generates energy-efficient parallel codes through a multi-stage, iterative approach integrating runtime power profiling, energy-aware prompting, self-correcting feedback loops, and an LLM-as-a-Judge agent for automated screening of code solutions. We introduce energy-reduction@k, a novel metric that quantifies expected energy reduction when generating k code candidates and selecting the most energy-efficient, enabling systematic evaluation of multi-attempt generation strategies. Evaluating 20 HeCBench applications and two miniApps on NVIDIA A100 and AMD MI100 GPUs, a single run (k=1) with LASSI-EE delivers refactored parallel codes with an average 29% expected energy reduction at an 81% pass rate, representing a 2.8x improvement over vanilla LLM prompting. Multiple runs (k=3) achieve an average 48% expected energy reduction at a 97% pass rate. These results are consistent across devices, demonstrating LASSI-EE's effectiveness across diverse hardware architectures.
Related papers
- Determining Energy Efficiency Sweet Spots in Production LLM Inference [1.633285971584668]
Existing approaches estimate energy consumption through simple linear functions of input and output sequence lengths.<n>We propose an analytical model derived from the computational and memory-access complexity of the Transformer architecture.<n>Our results show that aligning sequence lengths with these efficiency "Sweet Spots" can substantially reduce energy usage.
arXiv Detail & Related papers (2026-02-05T14:21:00Z) - Magneton: Optimizing Energy Efficiency of ML Systems via Differential Energy Debugging [8.58416976020519]
A significant but overlooked source of inefficiency is software energy waste caused by poor software design.<n>These inefficiencies arise in widely used ML frameworks and applications, yet developers often lack the visibility and tools to detect and diagnose them.<n>We propose differential energy debug, a novel approach that leverages the observation that competing ML systems often implement similar functionality with vastly different energy consumption.<n>Building on this insight, we design and implement Magneton, an energy profiler that compares energy consumption between similar ML systems at the operator level and automatically pinpoints code regions and configuration choices responsible for excessive energy use.
arXiv Detail & Related papers (2025-12-09T08:41:16Z) - Multi-Agent Reinforcement Learning for Sample-Efficient Deep Neural Network Mapping [54.65536245955678]
We present a decentralized multi-agent reinforcement learning (MARL) framework designed to overcome the challenge of sample inefficiency.<n>We introduce an agent clustering algorithm that assigns similar mapping parameters to the same agents based on correlation analysis.<n> Experimental results show our MARL approach improves sample efficiency by 30-300x over standard single-agent RL.
arXiv Detail & Related papers (2025-07-22T05:51:07Z) - Evaluating the Energy-Efficiency of the Code Generated by LLMs [2.1983110147455482]
This paper investigates the energy efficiency of the code generated by 20 popular Large Language Models for 878 programming problems.<n>Among the studied LLMs, DeepSeek-v3 and GPT-4o generate the most energy-efficient code.<n>For specific algorithmic groups such as dynamic programming, backtracking, and bit manipulation, LLM-generated code can consume up to 450 times more energy than human-generated canonical solutions.
arXiv Detail & Related papers (2025-05-23T18:13:27Z) - R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference [77.47238561728459]
R-Sparse is a training-free activation sparsity approach capable of achieving high sparsity levels in advanced LLMs.<n> Experiments on Llama-2/3 and Mistral models across ten diverse tasks demonstrate that R-Sparse achieves comparable performance at 50% model-level sparsity.
arXiv Detail & Related papers (2025-04-28T03:30:32Z) - Sustainable LLM Inference for Edge AI: Evaluating Quantized LLMs for Energy Efficiency, Output Accuracy, and Inference Latency [6.306413686006502]
We conduct a comprehensive analysis of 28 quantized Large Language Models (LLMs) from the Ollama library.<n>We evaluate energy efficiency, inference performance, and output accuracy across multiple quantization levels and task types.<n>Our findings reveal the trade-offs between energy efficiency, inference speed, and accuracy in different quantization settings.
arXiv Detail & Related papers (2025-04-04T11:29:30Z) - TuRTLe: A Unified Evaluation of LLMs for RTL Generation [0.6010802600885173]
We propose TuRTLe, a unified evaluation framework designed to assess LLMs across key RTL generation tasks.<n>We benchmark a diverse set of open LLMs and analyze their strengths and weaknesses in EDA-specific tasks.<n>Our results show that reasoning-based models, such as DeepSeek R1, consistently outperform others across multiple evaluation criteria.
arXiv Detail & Related papers (2025-03-31T07:43:12Z) - Can We Make Code Green? Understanding Trade-Offs in LLMs vs. Human Code Optimizations [45.243401722182554]
Large language models (LLMs) claim to assist developers in optimizing code for performance and energy efficiency.<n>This work focuses on software written in Matlab-widely used in both academia and industry for scientific and engineering applications.<n>We analyze energy-focused optimization on 400 scripts across 100 top GitHub repositories.
arXiv Detail & Related papers (2025-03-26T00:27:29Z) - ResBench: Benchmarking LLM-Generated FPGA Designs with Resource Awareness [7.3895963946365795]
Large Language Models (LLMs) have emerged as a promising tool for HDL generation.<n>Existing benchmarks for LLM-based code generation primarily focus on functional correctness while overlooking hardware resource usage.<n>We introduce ResBench, the first resource-focused benchmark explicitly designed to distinguish between resource-optimized and inefficient LLM-generated HDL code.
arXiv Detail & Related papers (2025-03-11T18:54:17Z) - GREEN-CODE: Learning to Optimize Energy Efficiency in LLM-based Code Generation [1.5749416770494706]
This work proposes a framework for energy-aware code generation in Large Language Models (LLMs)<n>We train a Reinforcement Learning (RL) agent that learns to balance the trade-offs between accuracy, latency, and energy consumption.<n>Results show that our method reduces the energy consumption between 23-50 % on average for code generation tasks without significantly affecting accuracy.
arXiv Detail & Related papers (2025-01-19T10:44:03Z) - PerfCodeGen: Improving Performance of LLM Generated Code with Execution Feedback [78.89596149768458]
Large Language Models (LLMs) are widely adopted for assisting in software development tasks.<n>We propose PerfCodeGen, a training-free framework that enhances the performance of LLM-generated code.
arXiv Detail & Related papers (2024-11-18T06:22:38Z) - DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution [114.61347672265076]
Development of MLLMs for real-world robots is challenging due to the typically limited computation and memory capacities available on robotic platforms.
We propose a Dynamic Early-Exit Framework for Robotic Vision-Language-Action Model (DeeR) that automatically adjusts the size of the activated MLLM.
DeeR demonstrates significant reductions in computational costs of LLM by 5.2-6.5x and GPU memory of LLM by 2-6x without compromising performance.
arXiv Detail & Related papers (2024-11-04T18:26:08Z) - DOMAINEVAL: An Auto-Constructed Benchmark for Multi-Domain Code Generation [48.11754113512047]
This study includes a code generation benchmark dataset DOMAINEVAL, encompassing six popular domains.
Our pipeline works in a fully automated manner, enabling a push-bottom construction from code repositories into formatted subjects under study.
The contributions of this study include a code generation benchmark dataset DOMAINEVAL, encompassing six popular domains, a fully automated pipeline for constructing code benchmarks, and an identification of the limitations of LLMs in code generation tasks based on their performance on DOMAINEVAL.
arXiv Detail & Related papers (2024-08-23T16:33:58Z) - InfiBench: Evaluating the Question-Answering Capabilities of Code Large Language Models [56.723509505549536]
InfiBench is the first large-scale freeform question-answering (QA) benchmark for code to our knowledge.
It comprises 234 carefully selected high-quality Stack Overflow questions that span across 15 programming languages.
We conduct a systematic evaluation for over 100 latest code LLMs on InfiBench, leading to a series of novel and insightful findings.
arXiv Detail & Related papers (2024-03-11T02:06:30Z) - Mercury: A Code Efficiency Benchmark for Code Large Language Models [41.51235610016959]
We present Mercury, the first code efficiency benchmark for Large Language Models for Code (Code LLMs)
It comprises 1,889 Python tasks, each accompanied by adequate solutions that serve as real-world efficiency baselines.
We introduce a new metric Beyond, which computes a runtime-percentile-weighted Pass score to reflect functional correctness and code efficiency simultaneously.
arXiv Detail & Related papers (2024-02-12T17:53:22Z) - StepCoder: Improve Code Generation with Reinforcement Learning from
Compiler Feedback [58.20547418182074]
We introduce StepCoder, a novel framework for code generation, consisting of two main components.
CCCS addresses the exploration challenge by breaking the long sequences code generation task into a Curriculum of Code Completion Subtasks.
FGO only optimize the model by masking the unexecuted code segments to provide Fine-Grained Optimization.
Our method improves the ability to explore the output space and outperforms state-of-the-art approaches in corresponding benchmarks.
arXiv Detail & Related papers (2024-02-02T13:14:31Z) - Distributed Inference and Fine-tuning of Large Language Models Over The
Internet [91.00270820533272]
Large language models (LLMs) are useful in many NLP tasks and become more capable with size.
These models require high-end hardware, making them inaccessible to most researchers.
We develop fault-tolerant inference algorithms and load-balancing protocols that automatically assign devices to maximize the total system throughput.
arXiv Detail & Related papers (2023-12-13T18:52:49Z) - ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code [76.84199699772903]
ML-Bench is a benchmark rooted in real-world programming applications that leverage existing code repositories to perform tasks.
To evaluate both Large Language Models (LLMs) and AI agents, two setups are employed: ML-LLM-Bench for assessing LLMs' text-to-code conversion within a predefined deployment environment, and ML-Agent-Bench for testing autonomous agents in an end-to-end task execution within a Linux sandbox environment.
arXiv Detail & Related papers (2023-11-16T12:03:21Z) - Energy-Efficient and Federated Meta-Learning via Projected Stochastic
Gradient Ascent [79.58680275615752]
We propose an energy-efficient federated meta-learning framework.
We assume each task is owned by a separate agent, so a limited number of tasks is used to train a meta-model.
arXiv Detail & Related papers (2021-05-31T08:15:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.