Is GitHub's Copilot as Bad as Humans at Introducing Vulnerabilities in
Code?
- URL: http://arxiv.org/abs/2204.04741v5
- Date: Sat, 6 Jan 2024 02:37:29 GMT
- Title: Is GitHub's Copilot as Bad as Humans at Introducing Vulnerabilities in
Code?
- Authors: Owura Asare, Meiyappan Nagappan, N. Asokan
- Abstract summary: We perform a comparative empirical analysis of Copilot-generated code from a security perspective.
We investigate whether Copilot is just as likely to introduce the same software vulnerabilities as human developers.
- Score: 12.350130201627186
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Several advances in deep learning have been successfully applied to the
software development process. Of recent interest is the use of neural language
models to build tools, such as Copilot, that assist in writing code. In this
paper we perform a comparative empirical analysis of Copilot-generated code
from a security perspective. The aim of this study is to determine if Copilot
is as bad as human developers. We investigate whether Copilot is just as likely
to introduce the same software vulnerabilities as human developers. Using a
dataset of C/C++ vulnerabilities, we prompt Copilot to generate suggestions in
scenarios that led to the introduction of vulnerabilities by human developers.
The suggestions are inspected and categorized in a 2-stage process based on
whether the original vulnerability or fix is reintroduced. We find that Copilot
replicates the original vulnerable code about 33% of the time while replicating
the fixed code at a 25% rate. However this behaviour is not consistent: Copilot
is more likely to introduce some types of vulnerabilities than others and is
also more likely to generate vulnerable code in response to prompts that
correspond to older vulnerabilities. Overall, given that in a significant
number of cases it did not replicate the vulnerabilities previously introduced
by human developers, we conclude that Copilot, despite performing differently
across various vulnerability types, is not as bad as human developers at
introducing vulnerabilities in code.
Related papers
- RedCode: Risky Code Execution and Generation Benchmark for Code Agents [50.81206098588923]
RedCode is a benchmark for risky code execution and generation.
RedCode-Exec provides challenging prompts that could lead to risky code execution.
RedCode-Gen provides 160 prompts with function signatures and docstrings as input to assess whether code agents will follow instructions.
arXiv Detail & Related papers (2024-11-12T13:30:06Z) - Discovery of Timeline and Crowd Reaction of Software Vulnerability Disclosures [47.435076500269545]
Apache Log4J was found to be vulnerable to remote code execution attacks.
More than 35,000 packages were forced to update their Log4J libraries with the latest version.
It is practically reasonable for software developers to update their third-party libraries whenever the software vendors have released a vulnerable-free version.
arXiv Detail & Related papers (2024-11-12T01:55:51Z) - GitHub Copilot: the perfect Code compLeeter? [3.708656266586145]
This paper aims to evaluate GitHub Copilot's generated code quality based on the LeetCode problem set.
We evaluate Copilot's reliability in the code generation stage, the correctness of the generated code and its dependency on the programming language.
arXiv Detail & Related papers (2024-06-17T08:38:29Z) - Exploring the Effect of Multiple Natural Languages on Code Suggestion
Using GitHub Copilot [46.822148186169144]
GitHub Copilot is an AI-enabled tool that automates program synthesis.
Recent studies have extensively examined Copilot's capabilities in various programming tasks.
However, little is known about the effect of different natural languages on code suggestion.
arXiv Detail & Related papers (2024-02-02T14:30:02Z) - Assessing the Security of GitHub Copilot Generated Code -- A Targeted
Replication Study [11.644996472213611]
Recent studies have investigated security issues in AI-powered code generation tools such as GitHub Copilot and Amazon CodeWhisperer.
This paper replicates the study of Pearce et al., which investigated security weaknesses in Copilot and uncovered several weaknesses in the code suggested by Copilot.
Our results indicate that, even with the improvements in newer versions of Copilot, the percentage of vulnerable code suggestions has reduced from 36.54% to 27.25%.
arXiv Detail & Related papers (2023-11-18T22:12:59Z) - Security Weaknesses of Copilot Generated Code in GitHub [8.364612094301071]
We analyze code snippets generated by GitHub Copilot from GitHub projects.
Our analysis identified 452 snippets generated by Copilot, revealing a high likelihood of security issues.
It also shows that practitioners should cultivate corresponding security awareness and skills.
arXiv Detail & Related papers (2023-10-03T14:01:28Z) - A User-centered Security Evaluation of Copilot [12.350130201627186]
We evaluate GitHub's Copilot to better understand its strengths and weaknesses with respect to code security.
We find that access to Copilot accompanies a more secure solution when tackling harder problems.
arXiv Detail & Related papers (2023-08-12T14:49:46Z) - CONCORD: Clone-aware Contrastive Learning for Source Code [64.51161487524436]
Self-supervised pre-training has gained traction for learning generic code representations valuable for many downstream SE tasks.
We argue that it is also essential to factor in how developers code day-to-day for general-purpose representation learning.
In particular, we propose CONCORD, a self-supervised, contrastive learning strategy to place benign clones closer in the representation space while moving deviants further apart.
arXiv Detail & Related papers (2023-06-05T20:39:08Z) - Generation Probabilities Are Not Enough: Uncertainty Highlighting in AI Code Completions [54.55334589363247]
We study whether conveying information about uncertainty enables programmers to more quickly and accurately produce code.
We find that highlighting tokens with the highest predicted likelihood of being edited leads to faster task completion and more targeted edits.
arXiv Detail & Related papers (2023-02-14T18:43:34Z) - CodeLMSec Benchmark: Systematically Evaluating and Finding Security
Vulnerabilities in Black-Box Code Language Models [58.27254444280376]
Large language models (LLMs) for automatic code generation have achieved breakthroughs in several programming tasks.
Training data for these models is usually collected from the Internet (e.g., from open-source repositories) and is likely to contain faults and security vulnerabilities.
This unsanitized training data can cause the language models to learn these vulnerabilities and propagate them during the code generation procedure.
arXiv Detail & Related papers (2023-02-08T11:54:07Z) - An Empirical Cybersecurity Evaluation of GitHub Copilot's Code
Contributions [8.285068188878578]
GitHub Copilot is a language model trained over open-source GitHub code.
Code often contains bugs - and so, it is certain that the language model will have learned from exploitable, buggy code.
This raises concerns on the security of Copilot's code contributions.
arXiv Detail & Related papers (2021-08-20T17:30:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.