Learning to Improve Code Efficiency
- URL: http://arxiv.org/abs/2208.05297v1
- Date: Tue, 9 Aug 2022 01:28:30 GMT
- Title: Learning to Improve Code Efficiency
- Authors: Binghong Chen, Daniel Tarlow, Kevin Swersky, Martin Maas, Pablo
Heiber, Ashish Naik, Milad Hashemi, Parthasarathy Ranganathan
- Abstract summary: We analyze a large competitive programming dataset from the Google Code Jam competition.
We find that efficient code is indeed rare, with a 2x difference between the median runtime and the 90th percentile of solutions.
We propose using machine learning to automatically provide prescriptive feedback in the form of hints, to guide programmers towards writing high-performance code.
- Score: 27.768476489523163
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Improvements in the performance of computing systems, driven by Moore's Law,
have transformed society. As such hardware-driven gains slow down, it becomes
even more important for software developers to focus on performance and
efficiency during development. While several studies have demonstrated the
potential from such improved code efficiency (e.g., 2x better generational
improvements compared to hardware), unlocking these gains in practice has been
challenging. Reasoning about algorithmic complexity and the interaction of
coding patterns on hardware can be challenging for the average programmer,
especially when combined with pragmatic constraints around development velocity
and multi-person development.
This paper seeks to address this problem. We analyze a large competitive
programming dataset from the Google Code Jam competition and find that
efficient code is indeed rare, with a 2x runtime difference between the median
and the 90th percentile of solutions. We propose using machine learning to
automatically provide prescriptive feedback in the form of hints, to guide
programmers towards writing high-performance code. To automatically learn these
hints from the dataset, we propose a novel discrete variational auto-encoder,
where each discrete latent variable represents a different learned category of
code-edit that increases performance. We show that this method represents the
multi-modal space of code efficiency edits better than a sequence-to-sequence
baseline and generates a distribution of more efficient solutions.
Related papers
- Effi-Code: Unleashing Code Efficiency in Language Models [17.355845751737423]
Effi-Code is an approach to enhancing code generation in large language models.
Effi-Code offers a scalable and generalizable approach to improving code generation in AI systems.
arXiv Detail & Related papers (2024-10-14T07:05:51Z) - CodeDPO: Aligning Code Models with Self Generated and Verified Source Code [52.70310361822519]
We propose CodeDPO, a framework that integrates preference learning into code generation to improve two key code preference factors: code correctness and efficiency.
CodeDPO employs a novel dataset construction method, utilizing a self-generation-and-validation mechanism that simultaneously generates and evaluates code and test cases.
arXiv Detail & Related papers (2024-10-08T01:36:15Z) - An Empirical Study on Self-correcting Large Language Models for Data Science Code Generation [1.335664823620186]
Large Language Models (LLMs) have recently advanced many applications on software engineering tasks.
CoT-SelfEvolve iteratively and automatically refines code through a self-correcting process.
arXiv Detail & Related papers (2024-08-28T09:19:09Z) - LLM-Assisted Code Cleaning For Training Accurate Code Generators [53.087019724256606]
We investigate data quality for code and find that making the code more structured and readable leads to improved code generation performance of the system.
We build a novel data-cleaning pipeline that uses these principles to transform existing programs.
We evaluate our approach on two challenging algorithmic code generation benchmarks and find that fine-tuning CodeLLaMa-7B improves the performance by up to 30% compared to fine-tuning on the original dataset.
arXiv Detail & Related papers (2023-11-25T02:45:50Z) - Learning Performance-Improving Code Edits [107.21538852090208]
We introduce a framework for adapting large language models (LLMs) to high-level program optimization.
First, we curate a dataset of performance-improving edits made by human programmers of over 77,000 competitive C++ programming submission pairs.
For prompting, we propose retrieval-based few-shot prompting and chain-of-thought, and for finetuning, these include performance-conditioned generation and synthetic data augmentation based on self-play.
arXiv Detail & Related papers (2023-02-15T18:59:21Z) - Chatbots As Fluent Polyglots: Revisiting Breakthrough Code Snippets [0.0]
The research applies AI-driven code assistants to analyze a selection of influential computer code that has shaped modern technology.
The original contribution of this study was to examine half of the most significant code advances in the last 50 years.
arXiv Detail & Related papers (2023-01-05T23:17:17Z) - Enhancing Semantic Code Search with Multimodal Contrastive Learning and
Soft Data Augmentation [50.14232079160476]
We propose a new approach with multimodal contrastive learning and soft data augmentation for code search.
We conduct extensive experiments to evaluate the effectiveness of our approach on a large-scale dataset with six programming languages.
arXiv Detail & Related papers (2022-04-07T08:49:27Z) - Rate Coding or Direct Coding: Which One is Better for Accurate, Robust,
and Energy-efficient Spiking Neural Networks? [4.872468969809081]
Spiking Neural Networks (SNNs) works focus on an image classification task, therefore various coding techniques have been proposed to convert an image into temporal binary spikes.
Among them, rate coding and direct coding are regarded as prospective candidates for building a practical SNN system.
We conduct a comprehensive analysis of the two codings from three perspectives: accuracy, adversarial robustness, and energy-efficiency.
arXiv Detail & Related papers (2022-01-31T16:18:07Z) - Kernel methods through the roof: handling billions of points efficiently [94.31450736250918]
Kernel methods provide an elegant and principled approach to nonparametric learning, but so far could hardly be used in large scale problems.
Recent advances have shown the benefits of a number of algorithmic ideas, for example combining optimization, numerical linear algebra and random projections.
Here, we push these efforts further to develop and test a solver that takes full advantage of GPU hardware.
arXiv Detail & Related papers (2020-06-18T08:16:25Z) - Auto-Encoding Twin-Bottleneck Hashing [141.5378966676885]
This paper proposes an efficient and adaptive code-driven graph.
It is updated by decoding in the context of an auto-encoder.
Experiments on benchmarked datasets clearly show the superiority of our framework over the state-of-the-art hashing methods.
arXiv Detail & Related papers (2020-02-27T05:58:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.