Related papers: Measuring Code Efficiency Optimization Capabilities with ACEOB

Measuring Code Efficiency Optimization Capabilities with ACEOB

URL: http://arxiv.org/abs/2408.12960v1
Date: Fri, 23 Aug 2024 10:10:37 GMT
Title: Measuring Code Efficiency Optimization Capabilities with ACEOB
Authors: Yue Pan, Xiuting Shao, Chen Lyu,
Abstract summary: We conduct an in-depth analysis of "code patterns" in the model training dataset, meticulously exploring human-written code. We introduce the Automatic Code Efficiency Optimization Benchmark (ACEOB), which consists of 95,359 pairs of efficient-inefficient code. To our knowledge, ACEOB is the first dataset specifically targeting Python code efficiency optimization.
Score: 7.4056083791645495
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As Moore's Law gains diminish, software performance and efficiency become increasingly vital. Optimizing code efficiency is challenging, even for professional programmers. However, related research remains relatively scarce, and rigorously assessing models' abilities to optimize code efficiency is fraught with difficulties. In response to this challenge, we first conduct an in-depth analysis of "code patterns" in the model training dataset, meticulously exploring human-written code. Secondly, we define a task for optimizing code efficiency and introduce the Automatic Code Efficiency Optimization Benchmark (ACEOB), which consists of 95,359 pairs of efficient-inefficient code aimed at assessing code efficiency optimization capabilities. To our knowledge, ACEOB is the first dataset specifically targeting Python code efficiency optimization. To evaluate models' ability in optimizing code efficiency, we propose two new metrics: the Isomorphic Optimal Comparison CodeBLEU (IOCCB) metric and the Normalized Performance Index (NPI) metric, to assess the efficiency of model-generated code. We also evaluate several advanced code models, such as PolyCoder and CodeT5, after fine-tuning them on ACEOB and demonstrate that the efficiency of each model improves after introducing the NPI filter. However, it was observed that even ChatGPT does not perform optimally in code efficiency optimization tasks.

Related papers

LLM4EFFI: Leveraging Large Language Models to Enhance Code Efficiency and Correctness [38.399282089600284]
Large Language Models (LLMs) have demonstrated impressive performance in code generation. tool: ulineLarge ulineLanguage ulineModel for Code ulineEfficiency is a novel framework that enables LLMs to generate code that balances both efficiency and correctness.
arXiv Detail & Related papers (2025-02-17T07:01:18Z)
ACECode: A Reinforcement Learning Framework for Aligning Code Efficiency and Correctness in Code Language Models [9.4219427550154]
Existing approaches for optimizing code efficiency for CodeLLMs like SOAP and PIE exhibit certain limitations. We introduce ACECode, a reinforcement learning-based fine-tuning framework that aligns CodeLLMs with dual objectives of efficiency and correctness. We evaluate ACECode by fine-tuning four SOTA (state-of-the-art) CodeLLMs and comparing their code with three baselines: original, instruction-tuned, and PIE-tuned CodeLLMs.
arXiv Detail & Related papers (2024-12-23T04:19:45Z)
Effi-Code: Unleashing Code Efficiency in Language Models [17.355845751737423]
Effi-Code is an approach to enhancing code generation in large language models. Effi-Code offers a scalable and generalizable approach to improving code generation in AI systems.
arXiv Detail & Related papers (2024-10-14T07:05:51Z)
Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System [75.25394449773052]
Large Language Model (LLM) based multi-agent systems (MAS) show remarkable potential in collaborative problem-solving. Yet they still face critical challenges: low communication efficiency, poor scalability, and a lack of effective parameter-updating optimization methods. We present Optima, a novel framework that addresses these issues by significantly enhancing both communication efficiency and task effectiveness.
arXiv Detail & Related papers (2024-10-10T17:00:06Z)
CodeDPO: Aligning Code Models with Self Generated and Verified Source Code [52.70310361822519]
We propose CodeDPO, a framework that integrates preference learning into code generation to improve two key code preference factors: code correctness and efficiency. CodeDPO employs a novel dataset construction method, utilizing a self-generation-and-validation mechanism that simultaneously generates and evaluates code and test cases.
arXiv Detail & Related papers (2024-10-08T01:36:15Z)
Evaluating Language Models for Efficient Code Generation [13.175840119811]
We introduce Differential Performance Evaluation (DPE) to reliably evaluate Large Language Models (LLMs) DPE focuses on efficiency-demanding programming tasks and establishing an insightful compound metric for performance evaluation. As a proof of concept, we use DPE to create EvalPerf, a benchmark with 121 performance-challenging coding tasks.
arXiv Detail & Related papers (2024-08-12T18:59:13Z)
Iterative or Innovative? A Problem-Oriented Perspective for Code Optimization [81.88668100203913]
Large language models (LLMs) have demonstrated strong capabilities in solving a wide range of programming tasks. In this paper, we explore code optimization with a focus on performance enhancement, specifically aiming to optimize code for minimal execution time.
arXiv Detail & Related papers (2024-06-17T16:10:10Z)
How Efficient is LLM-Generated Code? A Rigorous & High-Standard Benchmark [39.13045037676502]
Development of large language models (LLMs) has significantly pushed the frontiers of program synthesis. Most evaluation frameworks focus on the (functional) correctness of generated code; efficiency, as an important measure of code quality, has been overlooked in existing evaluations. We develop ENAMEL, a rigorous and high-standard benchmark for evaluating the capability of LLMs in generating efficient code.
arXiv Detail & Related papers (2024-06-10T04:19:20Z)
Lower-Left Partial AUC: An Effective and Efficient Optimization Metric for Recommendation [52.45394284415614]
We propose a new optimization metric, Lower-Left Partial AUC (LLPAUC), which is computationally efficient like AUC but strongly correlates with Top-K ranking metrics. LLPAUC considers only the partial area under the ROC curve in the Lower-Left corner to push the optimization focus on Top-K.
arXiv Detail & Related papers (2024-02-29T13:58:33Z)
Judging Adam: Studying the Performance of Optimization Methods on ML4SE Tasks [2.8961929092154697]
We test the performance of variouss on deep learning models for source code. We find that the choice of anahead can have a significant impact on the model quality. We suggest that the ML4SE community should consider using RAdam instead Adam as the default for code-related deep learning tasks.
arXiv Detail & Related papers (2023-03-06T22:49:20Z)
An Empirical Evaluation of Zeroth-Order Optimization Methods on AI-driven Molecule Optimization [78.36413169647408]
We study the effectiveness of various ZO optimization methods for optimizing molecular objectives. We show the advantages of ZO sign-based gradient descent (ZO-signGD) We demonstrate the potential effectiveness of ZO optimization methods on widely used benchmark tasks from the Guacamol suite.
arXiv Detail & Related papers (2022-10-27T01:58:10Z)
Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting. We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z)
Bayesian Optimization for Selecting Efficient Machine Learning Models [53.202224677485525]
We present a unified Bayesian Optimization framework for jointly optimizing models for both prediction effectiveness and training efficiency. Experiments on model selection for recommendation tasks indicate models selected this way significantly improves model training efficiency.
arXiv Detail & Related papers (2020-08-02T02:56:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.