An Empirical Cybersecurity Evaluation of GitHub Copilot's Code
Contributions
- URL: http://arxiv.org/abs/2108.09293v2
- Date: Mon, 23 Aug 2021 23:52:51 GMT
- Title: An Empirical Cybersecurity Evaluation of GitHub Copilot's Code
Contributions
- Authors: Hammond Pearce, Baleegh Ahmad, Benjamin Tan, Brendan Dolan-Gavitt,
Ramesh Karri
- Abstract summary: GitHub Copilot is a language model trained over open-source GitHub code.
Code often contains bugs - and so, it is certain that the language model will have learned from exploitable, buggy code.
This raises concerns on the security of Copilot's code contributions.
- Score: 8.285068188878578
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: There is burgeoning interest in designing AI-based systems to assist humans
in designing computing systems, including tools that automatically generate
computer code. The most notable of these comes in the form of the first
self-described `AI pair programmer', GitHub Copilot, a language model trained
over open-source GitHub code. However, code often contains bugs - and so, given
the vast quantity of unvetted code that Copilot has processed, it is certain
that the language model will have learned from exploitable, buggy code. This
raises concerns on the security of Copilot's code contributions. In this work,
we systematically investigate the prevalence and conditions that can cause
GitHub Copilot to recommend insecure code. To perform this analysis we prompt
Copilot to generate code in scenarios relevant to high-risk CWEs (e.g. those
from MITRE's "Top 25" list). We explore Copilot's performance on three distinct
code generation axes -- examining how it performs given diversity of
weaknesses, diversity of prompts, and diversity of domains. In total, we
produce 89 different scenarios for Copilot to complete, producing 1,692
programs. Of these, we found approximately 40% to be vulnerable.
Related papers
- RedCode: Risky Code Execution and Generation Benchmark for Code Agents [50.81206098588923]
RedCode is a benchmark for risky code execution and generation.
RedCode-Exec provides challenging prompts that could lead to risky code execution.
RedCode-Gen provides 160 prompts with function signatures and docstrings as input to assess whether code agents will follow instructions.
arXiv Detail & Related papers (2024-11-12T13:30:06Z) - GitHub Copilot: the perfect Code compLeeter? [3.708656266586145]
This paper aims to evaluate GitHub Copilot's generated code quality based on the LeetCode problem set.
We evaluate Copilot's reliability in the code generation stage, the correctness of the generated code and its dependency on the programming language.
arXiv Detail & Related papers (2024-06-17T08:38:29Z) - Exploring the Effect of Multiple Natural Languages on Code Suggestion
Using GitHub Copilot [46.822148186169144]
GitHub Copilot is an AI-enabled tool that automates program synthesis.
Recent studies have extensively examined Copilot's capabilities in various programming tasks.
However, little is known about the effect of different natural languages on code suggestion.
arXiv Detail & Related papers (2024-02-02T14:30:02Z) - Exploring the Problems, their Causes and Solutions of AI Pair Programming: A Study on GitHub and Stack Overflow [6.724815667295355]
GitHub Copilot, the AI programmer pair, utilize machine learning models trained on a large corpus of code snippets to generate code suggestions.
Despite its popularity in software development, there is limited empirical evidence on the actual experiences of practitioners who work with Copilot.
We collected data from 473 GitHub issues, 706 GitHub discussions, and 142 Stack Overflow posts.
arXiv Detail & Related papers (2023-11-02T06:24:38Z) - Security Weaknesses of Copilot Generated Code in GitHub [8.364612094301071]
We analyze code snippets generated by GitHub Copilot from GitHub projects.
Our analysis identified 452 snippets generated by Copilot, revealing a high likelihood of security issues.
It also shows that practitioners should cultivate corresponding security awareness and skills.
arXiv Detail & Related papers (2023-10-03T14:01:28Z) - Demystifying Practices, Challenges and Expected Features of Using GitHub
Copilot [3.655281304961642]
We conducted an empirical study by collecting and analyzing the data from Stack Overflow (SO) and GitHub Discussions.
We identified the programming languages, technologies used with Copilot, functions implemented, benefits, limitations, and challenges when using Copilot.
Our results suggest that using Copilot is like a double-edged sword, which requires developers to carefully consider various aspects when deciding whether or not to use it.
arXiv Detail & Related papers (2023-09-11T16:39:37Z) - Generation Probabilities Are Not Enough: Uncertainty Highlighting in AI Code Completions [54.55334589363247]
We study whether conveying information about uncertainty enables programmers to more quickly and accurately produce code.
We find that highlighting tokens with the highest predicted likelihood of being edited leads to faster task completion and more targeted edits.
arXiv Detail & Related papers (2023-02-14T18:43:34Z) - CodeLMSec Benchmark: Systematically Evaluating and Finding Security
Vulnerabilities in Black-Box Code Language Models [58.27254444280376]
Large language models (LLMs) for automatic code generation have achieved breakthroughs in several programming tasks.
Training data for these models is usually collected from the Internet (e.g., from open-source repositories) and is likely to contain faults and security vulnerabilities.
This unsanitized training data can cause the language models to learn these vulnerabilities and propagate them during the code generation procedure.
arXiv Detail & Related papers (2023-02-08T11:54:07Z) - Level 2 Autonomous Driving on a Single Device: Diving into the Devils of
Openpilot [112.21008828205409]
Comma.ai claims one $999 aftermarket device mounted with a single camera and board inside owns the ability to handle L2 scenarios.
Together with open-sourced software of the entire system released by Comma.ai, the project is named Openpilot.
In this report, we would like to share our latest findings, shed some light on the new perspective of end-to-end autonomous driving from an industrial product-level side.
arXiv Detail & Related papers (2022-06-16T13:43:52Z) - Is GitHub's Copilot as Bad as Humans at Introducing Vulnerabilities in
Code? [12.350130201627186]
We perform a comparative empirical analysis of Copilot-generated code from a security perspective.
We investigate whether Copilot is just as likely to introduce the same software vulnerabilities as human developers.
arXiv Detail & Related papers (2022-04-10T18:32:04Z) - Measuring Coding Challenge Competence With APPS [54.22600767666257]
We introduce APPS, a benchmark for code generation.
Our benchmark includes 10,000 problems, which range from having simple one-line solutions to being substantial algorithmic challenges.
Recent models such as GPT-Neo can pass approximately 15% of the test cases of introductory problems.
arXiv Detail & Related papers (2021-05-20T17:58:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.