Measuring the Runtime Performance of C++ Code Written by Humans using GitHub Copilot
- URL: http://arxiv.org/abs/2305.06439v2
- Date: Wed, 11 Dec 2024 21:52:23 GMT
- Title: Measuring the Runtime Performance of C++ Code Written by Humans using GitHub Copilot
- Authors: Daniel Erhabor, Sreeharsha Udayashankar, Meiyappan Nagappan, Samer Al-Kiswany,
- Abstract summary: We evaluate the runtime performance of C++ code produced when developers use GitHub Copilot versus when they do not.
Our results suggest that using Copilot may produce C++ code with (statistically significant) slower runtime performance.
- Score: 1.4665528337423246
- License:
- Abstract: GitHub Copilot is an artificially intelligent programming assistant used by many developers. While a few studies have evaluated the security risks of using Copilot, there has not been any study to show if it aids developers in producing code with better runtime performance. We evaluate the runtime performance of C++ code produced when developers use GitHub Copilot versus when they do not. To this end, we conducted a user study with 32 participants where each participant solved two C++ programming problems, one with Copilot and the other without it and measured the runtime performance of the participants' solutions on our test data. Our results suggest that using Copilot may produce C++ code with (statistically significant) slower runtime performance.
Related papers
- Copilot Arena: A Platform for Code LLM Evaluation in the Wild [44.33771124408514]
Copilot Arena is a platform to collect user preferences for code generation through native integration into a developer's working environment.
Copilot Arena has served over 4.5 million suggestions from 10 models and collected over 11k pairwise judgements.
arXiv Detail & Related papers (2025-02-13T13:40:52Z) - KGym: A Platform and Dataset to Benchmark Large Language Models on Linux Kernel Crash Resolution [59.20933707301566]
Large Language Models (LLMs) are consistently improving at increasingly realistic software engineering (SE) tasks.
In real-world software stacks, significant SE effort is spent developing foundational system software like the Linux kernel.
To evaluate if ML models are useful while developing such large-scale systems-level software, we introduce kGym and kBench.
arXiv Detail & Related papers (2024-07-02T21:44:22Z) - GitHub Copilot: the perfect Code compLeeter? [3.708656266586145]
This paper aims to evaluate GitHub Copilot's generated code quality based on the LeetCode problem set.
We evaluate Copilot's reliability in the code generation stage, the correctness of the generated code and its dependency on the programming language.
arXiv Detail & Related papers (2024-06-17T08:38:29Z) - Exploring the Effect of Multiple Natural Languages on Code Suggestion
Using GitHub Copilot [46.822148186169144]
GitHub Copilot is an AI-enabled tool that automates program synthesis.
Recent studies have extensively examined Copilot's capabilities in various programming tasks.
However, little is known about the effect of different natural languages on code suggestion.
arXiv Detail & Related papers (2024-02-02T14:30:02Z) - Exploring the Problems, their Causes and Solutions of AI Pair Programming: A Study on GitHub and Stack Overflow [6.724815667295355]
GitHub Copilot, the AI programmer pair, utilize machine learning models trained on a large corpus of code snippets to generate code suggestions.
Despite its popularity in software development, there is limited empirical evidence on the actual experiences of practitioners who work with Copilot.
We collected data from 473 GitHub issues, 706 GitHub discussions, and 142 Stack Overflow posts.
arXiv Detail & Related papers (2023-11-02T06:24:38Z) - Demystifying Practices, Challenges and Expected Features of Using GitHub
Copilot [3.655281304961642]
We conducted an empirical study by collecting and analyzing the data from Stack Overflow (SO) and GitHub Discussions.
We identified the programming languages, technologies used with Copilot, functions implemented, benefits, limitations, and challenges when using Copilot.
Our results suggest that using Copilot is like a double-edged sword, which requires developers to carefully consider various aspects when deciding whether or not to use it.
arXiv Detail & Related papers (2023-09-11T16:39:37Z) - Collaborative, Code-Proximal Dynamic Software Visualization within Code
Editors [55.57032418885258]
This paper introduces the design and proof-of-concept implementation for a software visualization approach that can be embedded into code editors.
Our contribution differs from related work in that we use dynamic analysis of a software system's runtime behavior.
Our visualization approach enhances common remote pair programming tools and is collaboratively usable by employing shared code cities.
arXiv Detail & Related papers (2023-08-30T06:35:40Z) - GitHub Copilot AI pair programmer: Asset or Liability? [14.572381978575182]
We study the capabilities of Copilot in two different programming tasks.
We compare Copilot's proposed solutions with those of human programmers on a set of programming tasks.
The results show that Copilot is capable of providing solutions for almost all fundamental algorithmic problems.
arXiv Detail & Related papers (2022-06-30T15:00:03Z) - Level 2 Autonomous Driving on a Single Device: Diving into the Devils of
Openpilot [112.21008828205409]
Comma.ai claims one $999 aftermarket device mounted with a single camera and board inside owns the ability to handle L2 scenarios.
Together with open-sourced software of the entire system released by Comma.ai, the project is named Openpilot.
In this report, we would like to share our latest findings, shed some light on the new perspective of end-to-end autonomous driving from an industrial product-level side.
arXiv Detail & Related papers (2022-06-16T13:43:52Z) - An Empirical Cybersecurity Evaluation of GitHub Copilot's Code
Contributions [8.285068188878578]
GitHub Copilot is a language model trained over open-source GitHub code.
Code often contains bugs - and so, it is certain that the language model will have learned from exploitable, buggy code.
This raises concerns on the security of Copilot's code contributions.
arXiv Detail & Related papers (2021-08-20T17:30:33Z) - Measuring Coding Challenge Competence With APPS [54.22600767666257]
We introduce APPS, a benchmark for code generation.
Our benchmark includes 10,000 problems, which range from having simple one-line solutions to being substantial algorithmic challenges.
Recent models such as GPT-Neo can pass approximately 15% of the test cases of introductory problems.
arXiv Detail & Related papers (2021-05-20T17:58:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.