Related papers: Choose Your Programming Copilot: A Comparison of the Program Synthesis Performance of GitHub Copilot and Genetic Programming

Choose Your Programming Copilot: A Comparison of the Program Synthesis Performance of GitHub Copilot and Genetic Programming

URL: http://arxiv.org/abs/2111.07875v1
Date: Mon, 15 Nov 2021 16:30:12 GMT
Title: Choose Your Programming Copilot: A Comparison of the Program Synthesis Performance of GitHub Copilot and Genetic Programming
Authors: Dominik Sobania, Martin Briesch, Franz Rothlauf
Abstract summary: GitHub Copilot is an extension for the Visual Studio Code development environment powered by the large-scale language model Codex. In this paper, we evaluate GitHub Copilot on standard program synthesis benchmark problems and compare the achieved results with those from the genetic programming literature. We find that the performance of the two approaches on the benchmark problems is quite similar, however, in comparison to GitHub Copilot, the program synthesis approaches based on genetic programming are not yet mature enough.
Score: 2.2559617939136505
License: http://creativecommons.org/licenses/by/4.0/
Abstract: GitHub Copilot, an extension for the Visual Studio Code development environment powered by the large-scale language model Codex, makes automatic program synthesis available for software developers. This model has been extensively studied in the field of deep learning, however, a comparison to genetic programming, which is also known for its performance in automatic program synthesis, has not yet been carried out. In this paper, we evaluate GitHub Copilot on standard program synthesis benchmark problems and compare the achieved results with those from the genetic programming literature. In addition, we discuss the performance of both approaches. We find that the performance of the two approaches on the benchmark problems is quite similar, however, in comparison to GitHub Copilot, the program synthesis approaches based on genetic programming are not yet mature enough to support programmers in practical software development. Genetic programming usually needs a huge amount of expensive hand-labeled training cases and takes too much time to generate solutions. Furthermore, source code generated by genetic programming approaches is often bloated and difficult to understand. For future work on program synthesis with genetic programming, we suggest researchers to focus on improving the execution time, readability, and usability.

Related papers

Comprehension-Performance Gap in GenAI-Assisted Brownfield Programming: A Replication and Extension [0.41998444721319217]
Code comprehension is essential for brownfield programming tasks.<n>Generative AI (GenAI) coding assistants such as GitHub Copilot have been shown to improve developer productivity.<n>We explore both performance and comprehension in GenAI-assisted brownfield programming tasks.
arXiv Detail & Related papers (2025-11-04T19:03:55Z)
A Human Centric Requirements Engineering Framework for Assessing Github Copilot Output [0.0]
GitHub Copilot introduces new challenges in how these software tools address human needs.<n>I analyzed GitHub Copilot's interaction with users through its chat interface.<n>I established a human-centered requirements framework with clear metrics to evaluate these qualities.
arXiv Detail & Related papers (2025-08-05T21:33:23Z)
SwingArena: Competitive Programming Arena for Long-context GitHub Issue Solving [90.32201622392137]
We present SwingArena, a competitive evaluation framework for Large Language Models (LLMs)<n>Unlike traditional static benchmarks, SwingArena models the collaborative process of software by pairing LLMs as iterations, who generate patches, and reviewers, who create test cases and verify the patches through continuous integration (CI) pipelines.
arXiv Detail & Related papers (2025-05-29T18:28:02Z)
Benchmarking ChatGPT, Codeium, and GitHub Copilot: A Comparative Study of AI-Driven Programming and Debugging Assistants [0.0]
Large language models (LLMs) have become essential for tasks like code generation, bug fixing, and optimization. This paper presents a comparative study of ChatGPT, Codeium, and GitHub Copilot, evaluating their performance on LeetCode problems.
arXiv Detail & Related papers (2024-09-30T03:53:40Z)
Hierarchical Neural Program Synthesis [19.94176152035497]
Program synthesis aims to automatically construct human-readable programs that satisfy given task specifications. We present a scalable program synthesis framework that instead synthesizes a program by hierarchically composing programs. We extensively evaluate our proposed framework in a string transformation domain with input/output pairs.
arXiv Detail & Related papers (2023-03-09T18:20:07Z)
BigIssue: A Realistic Bug Localization Benchmark [89.8240118116093]
BigIssue is a benchmark for realistic bug localization. We provide a general benchmark with a diversity of real and synthetic Java bugs. We hope to advance the state of the art in bug localization, in turn improving APR performance and increasing its applicability to the modern development cycle.
arXiv Detail & Related papers (2022-07-21T20:17:53Z)
A Conversational Paradigm for Program Synthesis [110.94409515865867]
We propose a conversational program synthesis approach via large language models. We train a family of large language models, called CodeGen, on natural language and programming language data. Our findings show the emergence of conversational capabilities and the effectiveness of the proposed conversational program synthesis paradigm.
arXiv Detail & Related papers (2022-03-25T06:55:15Z)
Iterative Genetic Improvement: Scaling Stochastic Program Synthesis [11.195558777385259]
Program synthesis aims to it automatically find programs from an underlying programming language that satisfy a given specification. Existing program synthesis techniques do not meet this expectation very well, suffering from the scalability issue. Here we propose a new framework for program synthesis, called iterative genetic improvement to overcome this problem.
arXiv Detail & Related papers (2022-02-26T02:00:35Z)
Competition-Level Code Generation with AlphaCode [74.87216298566942]
We introduce AlphaCode, a system for code generation that can create novel solutions to problems that require deeper reasoning. In simulated evaluations on recent programming competitions on the Codeforces platform, AlphaCode achieved on average a ranking of top 54.3%.
arXiv Detail & Related papers (2022-02-08T23:16:31Z)
Recent Developments in Program Synthesis with Evolutionary Algorithms [1.8047694351309207]
We identify the relevant evolutionary program synthesis approaches and provide an in-depth analysis of their performance. The most influential approaches we identify are stack-based, grammar-guided, as well as linear genetic programming. For future work, we encourage researchers not only to use a program's output for assessing the quality of a solution but also the way towards a solution.
arXiv Detail & Related papers (2021-08-27T11:38:27Z)
Latent Execution for Neural Program Synthesis Beyond Domain-Specific Languages [97.58968222942173]
We take the first step to synthesize C programs from input-output examples. In particular, we propose La Synth, which learns the latent representation to approximate the execution of partially generated programs. We show that training on these synthesized programs further improves the prediction performance for both Karel and C program synthesis.
arXiv Detail & Related papers (2021-06-29T02:21:32Z)
Measuring Coding Challenge Competence With APPS [54.22600767666257]
We introduce APPS, a benchmark for code generation. Our benchmark includes 10,000 problems, which range from having simple one-line solutions to being substantial algorithmic challenges. Recent models such as GPT-Neo can pass approximately 15% of the test cases of introductory problems.
arXiv Detail & Related papers (2021-05-20T17:58:42Z)
Code Building Genetic Programming [0.0]
We introduce Code Building Genetic Programming (CBGP) as a framework within which this can be done. CBGP produces a computational graph that can be executed or translated into source code of a host language.
arXiv Detail & Related papers (2020-08-09T04:33:04Z)
Synthesize, Execute and Debug: Learning to Repair for Neural Program Synthesis [81.54148730967394]
We propose SED, a neural program generation framework that incorporates synthesis, execution, and debug stages. SED first produces initial programs using the neural program synthesizer component, then utilizes a neural program debugger to iteratively repair the generated programs. On Karel, a challenging input-output program synthesis benchmark, SED reduces the error rate of the neural program synthesizer itself by a considerable margin, and outperforms the standard beam search for decoding.
arXiv Detail & Related papers (2020-07-16T04:15:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.