Leveraging LLMs to Automate Energy-Aware Refactoring of Parallel Scientific Codes
- URL: http://arxiv.org/abs/2505.02184v1
- Date: Sun, 04 May 2025 17:05:34 GMT
- Title: Leveraging LLMs to Automate Energy-Aware Refactoring of Parallel Scientific Codes
- Authors: Matthew T. Dearing, Yiheng Tao, Xingfu Wu, Zhiling Lan, Valerie Taylor,
- Abstract summary: Large language models (LLMs) are increasingly used for generating parallel scientific code.<n>In this work, we propose LASSI-EE, an automated LLM-based framework that generates energy-efficient parallel code.<n>Through a multi-stage, iterative pipeline process, LASSI-EE achieved an average energy reduction of 47% across 85% of the 20 HeCBench benchmarks.
- Score: 0.23301643766310373
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: While large language models (LLMs) are increasingly used for generating parallel scientific code, most current efforts emphasize functional correctness, often overlooking performance and energy considerations. In this work, we propose LASSI-EE, an automated LLM-based refactoring framework that generates energy-efficient parallel code on a target parallel system for a given parallel code as input. Through a multi-stage, iterative pipeline process, LASSI-EE achieved an average energy reduction of 47% across 85% of the 20 HeCBench benchmarks tested on NVIDIA A100 GPUs. Our findings demonstrate the broader potential of LLMs, not only for generating correct code but also for enabling energy-aware programming. We also address key insights and limitations within the framework, offering valuable guidance for future improvements.
Related papers
- TuRTLe: A Unified Evaluation of LLMs for RTL Generation [0.6010802600885173]
We propose TuRTLe, a unified evaluation framework designed to assess LLMs across key RTL generation tasks.<n>We benchmark a diverse set of open LLMs and analyze their strengths and weaknesses in EDA-specific tasks.<n>Our results show that reasoning-based models, such as DeepSeek R1, consistently outperform others across multiple evaluation criteria.
arXiv Detail & Related papers (2025-03-31T07:43:12Z) - Can We Make Code Green? Understanding Trade-Offs in LLMs vs. Human Code Optimizations [45.243401722182554]
Large language models (LLMs) claim to assist developers in optimizing code for performance and energy efficiency.<n>This work focuses on software written in Matlab-widely used in both academia and industry for scientific and engineering applications.<n>We analyze energy-focused optimization on 400 scripts across 100 top GitHub repositories.
arXiv Detail & Related papers (2025-03-26T00:27:29Z) - ResBench: Benchmarking LLM-Generated FPGA Designs with Resource Awareness [7.3895963946365795]
Large Language Models (LLMs) have emerged as a promising tool for HDL generation.<n>Existing benchmarks for LLM-based code generation primarily focus on functional correctness while overlooking hardware resource usage.<n>We introduce ResBench, the first resource-focused benchmark explicitly designed to distinguish between resource-optimized and inefficient LLM-generated HDL code.
arXiv Detail & Related papers (2025-03-11T18:54:17Z) - GREEN-CODE: Learning to Optimize Energy Efficiency in LLM-based Code Generation [1.5749416770494706]
This work proposes a framework for energy-aware code generation in Large Language Models (LLMs)<n>We train a Reinforcement Learning (RL) agent that learns to balance the trade-offs between accuracy, latency, and energy consumption.<n>Results show that our method reduces the energy consumption between 23-50 % on average for code generation tasks without significantly affecting accuracy.
arXiv Detail & Related papers (2025-01-19T10:44:03Z) - PerfCodeGen: Improving Performance of LLM Generated Code with Execution Feedback [78.89596149768458]
Large Language Models (LLMs) are widely adopted for assisting in software development tasks.<n>We propose PerfCodeGen, a training-free framework that enhances the performance of LLM-generated code.
arXiv Detail & Related papers (2024-11-18T06:22:38Z) - DOMAINEVAL: An Auto-Constructed Benchmark for Multi-Domain Code Generation [48.11754113512047]
This study includes a code generation benchmark dataset DOMAINEVAL, encompassing six popular domains.
Our pipeline works in a fully automated manner, enabling a push-bottom construction from code repositories into formatted subjects under study.
The contributions of this study include a code generation benchmark dataset DOMAINEVAL, encompassing six popular domains, a fully automated pipeline for constructing code benchmarks, and an identification of the limitations of LLMs in code generation tasks based on their performance on DOMAINEVAL.
arXiv Detail & Related papers (2024-08-23T16:33:58Z) - InfiBench: Evaluating the Question-Answering Capabilities of Code Large Language Models [56.723509505549536]
InfiBench is the first large-scale freeform question-answering (QA) benchmark for code to our knowledge.
It comprises 234 carefully selected high-quality Stack Overflow questions that span across 15 programming languages.
We conduct a systematic evaluation for over 100 latest code LLMs on InfiBench, leading to a series of novel and insightful findings.
arXiv Detail & Related papers (2024-03-11T02:06:30Z) - Mercury: A Code Efficiency Benchmark for Code Large Language Models [41.51235610016959]
We present Mercury, the first code efficiency benchmark for Large Language Models for Code (Code LLMs)
It comprises 1,889 Python tasks, each accompanied by adequate solutions that serve as real-world efficiency baselines.
We introduce a new metric Beyond, which computes a runtime-percentile-weighted Pass score to reflect functional correctness and code efficiency simultaneously.
arXiv Detail & Related papers (2024-02-12T17:53:22Z) - StepCoder: Improve Code Generation with Reinforcement Learning from
Compiler Feedback [58.20547418182074]
We introduce StepCoder, a novel framework for code generation, consisting of two main components.
CCCS addresses the exploration challenge by breaking the long sequences code generation task into a Curriculum of Code Completion Subtasks.
FGO only optimize the model by masking the unexecuted code segments to provide Fine-Grained Optimization.
Our method improves the ability to explore the output space and outperforms state-of-the-art approaches in corresponding benchmarks.
arXiv Detail & Related papers (2024-02-02T13:14:31Z) - Distributed Inference and Fine-tuning of Large Language Models Over The
Internet [91.00270820533272]
Large language models (LLMs) are useful in many NLP tasks and become more capable with size.
These models require high-end hardware, making them inaccessible to most researchers.
We develop fault-tolerant inference algorithms and load-balancing protocols that automatically assign devices to maximize the total system throughput.
arXiv Detail & Related papers (2023-12-13T18:52:49Z) - ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code [76.84199699772903]
ML-Bench is a benchmark rooted in real-world programming applications that leverage existing code repositories to perform tasks.
To evaluate both Large Language Models (LLMs) and AI agents, two setups are employed: ML-LLM-Bench for assessing LLMs' text-to-code conversion within a predefined deployment environment, and ML-Agent-Bench for testing autonomous agents in an end-to-end task execution within a Linux sandbox environment.
arXiv Detail & Related papers (2023-11-16T12:03:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.