Toward Green Code: Prompting Small Language Models for Energy-Efficient Code Generation
- URL: http://arxiv.org/abs/2509.09947v2
- Date: Tue, 07 Oct 2025 17:06:21 GMT
- Title: Toward Green Code: Prompting Small Language Models for Energy-Efficient Code Generation
- Authors: Humza Ashraf, Syed Muhammad Danish, Shadikur Rahman, Zeeshan Sattar,
- Abstract summary: There is a growing concern about the environmental impact of large language models (LLMs) in software development.<n>This study investigates whether prompt engineering can improve the energy efficiency of SLMs in code generation.<n>We evaluate four open-source SLMs, StableCode-Instruct-3B, Qwen2.5-Coder-3B-Instruct, CodeLlama-7B-Instruct, and Phi-3-Mini-4K-Instruct, across 150 Python problems from LeetCode.
- Score: 0.5486463492959637
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There is a growing concern about the environmental impact of large language models (LLMs) in software development, particularly due to their high energy use and carbon footprint. Small Language Models (SLMs) offer a more sustainable alternative, requiring fewer computational resources while remaining effective for fundamental programming tasks. In this study, we investigate whether prompt engineering can improve the energy efficiency of SLMs in code generation. We evaluate four open-source SLMs, StableCode-Instruct-3B, Qwen2.5-Coder-3B-Instruct, CodeLlama-7B-Instruct, and Phi-3-Mini-4K-Instruct, across 150 Python problems from LeetCode, evenly distributed into easy, medium, and hard categories. Each model is tested under four prompting strategies: role prompting, zero-shot, few-shot, and chain-of-thought (CoT). For every generated solution, we measure runtime, memory usage, and energy consumption, comparing the results with a human-written baseline. Our findings show that CoT prompting provides consistent energy savings for Qwen2.5-Coder and StableCode-3B, while CodeLlama-7B and Phi-3-Mini-4K fail to outperform the baseline under any prompting strategy. These results highlight that the benefits of prompting are model-dependent and that carefully designed prompts can guide SLMs toward greener software development.
Related papers
- Evaluating and Achieving Controllable Code Completion in Code LLM [89.64782747840225]
We present the first instruction-guided code completion benchmark, Controllable Code Completion Benchmark (C3-Bench)<n>We reveal substantial gaps in instruction-following capabilities between open-source and advanced proprietary models during code completion tasks.<n>The resulting model, Qwen2.5-Coder-C3, achieves state-of-the-art performance on C3-Bench.
arXiv Detail & Related papers (2026-01-22T11:40:04Z) - Environment-Aware Code Generation: How far are We? [52.69113158357018]
It is unclear whether large language models (LLMs) can reliably generate executable code tailored to a user's specific environment.<n>We present the first systematic study of Environment-Aware Code Generation (EACG), where generated code must be functionally correct and directly executable under arbitrary software configurations.<n>Our results show that current LLMs struggle with environment-specific code generation, while our adaptations improve environment compatibility and executability.
arXiv Detail & Related papers (2026-01-18T04:58:15Z) - Code-enabled language models can outperform reasoning models on diverse tasks [86.29363856881399]
We show that standard instruct LMs can already be elicited to be strong reasoners without finetuning.<n>This is achieved by CodeAdapt, where LMs interleave natural language reasoning with code execution in a multi-step fashion.<n>We find that CodeAdapt enables three LMs to outperform the corresponding RMs on average over eight tasks.
arXiv Detail & Related papers (2025-10-23T18:04:03Z) - Energy-Aware Code Generation with LLMs: Benchmarking Small vs. Large Language Models for Sustainable AI Programming [2.588812622437082]
We evaluate open-source Small Language Models (SLMs) trained explicitly for code generation against large Large Language Models (LLMs) and efficient human-written Python code.<n>We evaluate 150 coding problems from LeetCode, evenly distributed across three difficulty levels: easy, medium, and hard.<n>LLMs achieve the highest correctness across all difficulty levels, but SLMs are often more energy-efficient when their outputs are correct.
arXiv Detail & Related papers (2025-08-10T14:44:06Z) - Teaching LLM to Reason: Reinforcement Learning from Algorithmic Problems without Code [76.80306464249217]
We propose TeaR, which aims at teaching LLMs to reason better.<n>TeaR leverages careful data curation and reinforcement learning to guide models in discovering optimal reasoning paths through code-related tasks.<n>We conduct extensive experiments using two base models and three long-CoT distillation models, with model sizes ranging from 1.5 billion to 32 billion parameters, and across 17 benchmarks spanning Math, Knowledge, Code, and Logical Reasoning.
arXiv Detail & Related papers (2025-07-10T07:34:05Z) - Evaluating the Energy-Efficiency of the Code Generated by LLMs [2.1983110147455482]
This paper investigates the energy efficiency of the code generated by 20 popular Large Language Models for 878 programming problems.<n>Among the studied LLMs, DeepSeek-v3 and GPT-4o generate the most energy-efficient code.<n>For specific algorithmic groups such as dynamic programming, backtracking, and bit manipulation, LLM-generated code can consume up to 450 times more energy than human-generated canonical solutions.
arXiv Detail & Related papers (2025-05-23T18:13:27Z) - Resource-Efficient & Effective Code Summarization [3.512140256677132]
GreenAI techniques, such as QLoRA, offer a promising path for dealing with large models' sustainability.<n>Our study evaluates two state-of-the-art CLMs across two programming languages: Python and Java.<n>Results show that QLoRA enables efficient fine-tuning of CLMs for code summarization.
arXiv Detail & Related papers (2025-02-05T21:06:30Z) - AI-Powered, But Power-Hungry? Energy Efficiency of LLM-Generated Code [45.77395425799378]
This paper presents the first study analyzing the energy efficiency and performance of LLM-generated code for three programming languages Python, Java, and C++.<n>Our results show that the models are much more successful in generating Python and Java than C++ code.
arXiv Detail & Related papers (2025-02-04T15:32:34Z) - GREEN-CODE: Learning to Optimize Energy Efficiency in LLM-based Code Generation [1.5749416770494706]
This work proposes a framework for energy-aware code generation in Large Language Models (LLMs)<n>We train a Reinforcement Learning (RL) agent that learns to balance the trade-offs between accuracy, latency, and energy consumption.<n>Results show that our method reduces the energy consumption between 23-50 % on average for code generation tasks without significantly affecting accuracy.
arXiv Detail & Related papers (2025-01-19T10:44:03Z) - PerfCodeGen: Improving Performance of LLM Generated Code with Execution Feedback [78.89596149768458]
Large Language Models (LLMs) are widely adopted for assisting in software development tasks.<n>We propose PerfCodeGen, a training-free framework that enhances the performance of LLM-generated code.
arXiv Detail & Related papers (2024-11-18T06:22:38Z) - What's Wrong with Your Code Generated by Large Language Models? An Extensive Study [80.18342600996601]
Large language models (LLMs) produce code that is shorter yet more complicated as compared to canonical solutions.
We develop a taxonomy of bugs for incorrect codes that includes three categories and 12 sub-categories, and analyze the root cause for common bug types.
We propose a novel training-free iterative method that introduces self-critique, enabling LLMs to critique and correct their generated code based on bug types and compiler feedback.
arXiv Detail & Related papers (2024-07-08T17:27:17Z) - Comments as Natural Logic Pivots: Improve Code Generation via Comment Perspective [85.48043537327258]
We propose MANGO (comMents As Natural loGic pivOts), including a comment contrastive training strategy and a corresponding logical comment decoding strategy.
Results indicate that MANGO significantly improves the code pass rate based on the strong baselines.
The robustness of the logical comment decoding strategy is notably higher than the Chain-of-thoughts prompting.
arXiv Detail & Related papers (2024-04-11T08:30:46Z) - CodeT5+: Open Code Large Language Models for Code Understanding and
Generation [72.1638273937025]
Large language models (LLMs) pretrained on vast source code have achieved prominent progress in code intelligence.
CodeT5+ is a family of encoder-decoder LLMs for code in which component modules can be flexibly combined to suit a wide range of downstream code tasks.
We extensively evaluate CodeT5+ on over 20 code-related benchmarks in different settings, including zero-shot, finetuning, and instruction-tuning.
arXiv Detail & Related papers (2023-05-13T14:23:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.