Learning to Improve Code Efficiency
- URL: http://arxiv.org/abs/2208.05297v1
- Date: Tue, 9 Aug 2022 01:28:30 GMT
- Title: Learning to Improve Code Efficiency
- Authors: Binghong Chen, Daniel Tarlow, Kevin Swersky, Martin Maas, Pablo
Heiber, Ashish Naik, Milad Hashemi, Parthasarathy Ranganathan
- Abstract summary: We analyze a large competitive programming dataset from the Google Code Jam competition.
We find that efficient code is indeed rare, with a 2x difference between the median runtime and the 90th percentile of solutions.
We propose using machine learning to automatically provide prescriptive feedback in the form of hints, to guide programmers towards writing high-performance code.
- Score: 27.768476489523163
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Improvements in the performance of computing systems, driven by Moore's Law,
have transformed society. As such hardware-driven gains slow down, it becomes
even more important for software developers to focus on performance and
efficiency during development. While several studies have demonstrated the
potential from such improved code efficiency (e.g., 2x better generational
improvements compared to hardware), unlocking these gains in practice has been
challenging. Reasoning about algorithmic complexity and the interaction of
coding patterns on hardware can be challenging for the average programmer,
especially when combined with pragmatic constraints around development velocity
and multi-person development.
This paper seeks to address this problem. We analyze a large competitive
programming dataset from the Google Code Jam competition and find that
efficient code is indeed rare, with a 2x runtime difference between the median
and the 90th percentile of solutions. We propose using machine learning to
automatically provide prescriptive feedback in the form of hints, to guide
programmers towards writing high-performance code. To automatically learn these
hints from the dataset, we propose a novel discrete variational auto-encoder,
where each discrete latent variable represents a different learned category of
code-edit that increases performance. We show that this method represents the
multi-modal space of code efficiency edits better than a sequence-to-sequence
baseline and generates a distribution of more efficient solutions.
Related papers
- ChatGPT Code Detection: Techniques for Uncovering the Source of Code [0.0]
We use advanced classification techniques to differentiate between code written by humans and that generated by ChatGPT.
We employ a new approach that combines powerful embedding features (black-box) with supervised learning algorithms.
We show that untrained humans solve the same task not better than random guessing.
arXiv Detail & Related papers (2024-05-24T12:56:18Z) - Supercompiler Code Optimization with Zero-Shot Reinforcement Learning [63.164423329052404]
We present CodeZero, an artificial intelligence agent trained extensively on large data to produce effective optimization strategies instantly for each program in a single trial of the agent.
Our methodology kindles the great potential of artificial intelligence for engineering and paves the way for scaling machine learning techniques in the realm of code optimization.
arXiv Detail & Related papers (2024-04-24T09:20:33Z) - LLM-Assisted Code Cleaning For Training Accurate Code Generators [53.087019724256606]
We investigate data quality for code and find that making the code more structured and readable leads to improved code generation performance of the system.
We build a novel data-cleaning pipeline that uses these principles to transform existing programs.
We evaluate our approach on two challenging algorithmic code generation benchmarks and find that fine-tuning CodeLLaMa-7B improves the performance by up to 30% compared to fine-tuning on the original dataset.
arXiv Detail & Related papers (2023-11-25T02:45:50Z) - Learning Performance-Improving Code Edits [107.21538852090208]
We introduce a framework for adapting large language models (LLMs) to high-level program optimization.
First, we curate a dataset of performance-improving edits made by human programmers of over 77,000 competitive C++ programming submission pairs.
For prompting, we propose retrieval-based few-shot prompting and chain-of-thought, and for finetuning, these include performance-conditioned generation and synthetic data augmentation based on self-play.
arXiv Detail & Related papers (2023-02-15T18:59:21Z) - Chatbots As Fluent Polyglots: Revisiting Breakthrough Code Snippets [0.0]
The research applies AI-driven code assistants to analyze a selection of influential computer code that has shaped modern technology.
The original contribution of this study was to examine half of the most significant code advances in the last 50 years.
arXiv Detail & Related papers (2023-01-05T23:17:17Z) - Enhancing Semantic Code Search with Multimodal Contrastive Learning and
Soft Data Augmentation [50.14232079160476]
We propose a new approach with multimodal contrastive learning and soft data augmentation for code search.
We conduct extensive experiments to evaluate the effectiveness of our approach on a large-scale dataset with six programming languages.
arXiv Detail & Related papers (2022-04-07T08:49:27Z) - Rate Coding or Direct Coding: Which One is Better for Accurate, Robust,
and Energy-efficient Spiking Neural Networks? [4.872468969809081]
Spiking Neural Networks (SNNs) works focus on an image classification task, therefore various coding techniques have been proposed to convert an image into temporal binary spikes.
Among them, rate coding and direct coding are regarded as prospective candidates for building a practical SNN system.
We conduct a comprehensive analysis of the two codings from three perspectives: accuracy, adversarial robustness, and energy-efficiency.
arXiv Detail & Related papers (2022-01-31T16:18:07Z) - Kernel methods through the roof: handling billions of points efficiently [94.31450736250918]
Kernel methods provide an elegant and principled approach to nonparametric learning, but so far could hardly be used in large scale problems.
Recent advances have shown the benefits of a number of algorithmic ideas, for example combining optimization, numerical linear algebra and random projections.
Here, we push these efforts further to develop and test a solver that takes full advantage of GPU hardware.
arXiv Detail & Related papers (2020-06-18T08:16:25Z) - Auto-Encoding Twin-Bottleneck Hashing [141.5378966676885]
This paper proposes an efficient and adaptive code-driven graph.
It is updated by decoding in the context of an auto-encoder.
Experiments on benchmarked datasets clearly show the superiority of our framework over the state-of-the-art hashing methods.
arXiv Detail & Related papers (2020-02-27T05:58:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.