Related papers: Assessing the Security of GitHub Copilot Generated Code -- A Targeted Replication Study

Assessing the Security of GitHub Copilot Generated Code -- A Targeted Replication Study

URL: http://arxiv.org/abs/2311.11177v1
Date: Sat, 18 Nov 2023 22:12:59 GMT
Title: Assessing the Security of GitHub Copilot Generated Code -- A Targeted Replication Study
Authors: Vahid Majdinasab and Michael Joshua Bishop and Shawn Rasheed and Arghavan Moradidakhel and Amjed Tahir and Foutse Khomh
Abstract summary: Recent studies have investigated security issues in AI-powered code generation tools such as GitHub Copilot and Amazon CodeWhisperer. This paper replicates the study of Pearce et al., which investigated security weaknesses in Copilot and uncovered several weaknesses in the code suggested by Copilot. Our results indicate that, even with the improvements in newer versions of Copilot, the percentage of vulnerable code suggestions has reduced from 36.54% to 27.25%.
Score: 11.644996472213611
License: http://creativecommons.org/licenses/by/4.0/
Abstract: AI-powered code generation models have been developing rapidly, allowing developers to expedite code generation and thus improve their productivity. These models are trained on large corpora of code (primarily sourced from public repositories), which may contain bugs and vulnerabilities. Several concerns have been raised about the security of the code generated by these models. Recent studies have investigated security issues in AI-powered code generation tools such as GitHub Copilot and Amazon CodeWhisperer, revealing several security weaknesses in the code generated by these tools. As these tools evolve, it is expected that they will improve their security protocols to prevent the suggestion of insecure code to developers. This paper replicates the study of Pearce et al., which investigated security weaknesses in Copilot and uncovered several weaknesses in the code suggested by Copilot across diverse scenarios and languages (Python, C and Verilog). Our replication examines Copilot security weaknesses using newer versions of Copilot and CodeQL (the security analysis framework). The replication focused on the presence of security vulnerabilities in Python code. Our results indicate that, even with the improvements in newer versions of Copilot, the percentage of vulnerable code suggestions has reduced from 36.54% to 27.25%. Nonetheless, it remains evident that the model still suggests insecure code.

Related papers

RedCode: Risky Code Execution and Generation Benchmark for Code Agents [50.81206098588923]
RedCode is a benchmark for risky code execution and generation. RedCode-Exec provides challenging prompts that could lead to risky code execution. RedCode-Gen provides 160 prompts with function signatures and docstrings as input to assess whether code agents will follow instructions.
arXiv Detail & Related papers (2024-11-12T13:30:06Z)
HexaCoder: Secure Code Generation via Oracle-Guided Synthetic Training Data [60.75578581719921]
Large language models (LLMs) have shown great potential for automatic code generation. Recent studies highlight that many LLM-generated code contains serious security vulnerabilities. We introduce HexaCoder, a novel approach to enhance the ability of LLMs to generate secure codes.
arXiv Detail & Related papers (2024-09-10T12:01:43Z)
Is Your AI-Generated Code Really Safe? Evaluating Large Language Models on Secure Code Generation with CodeSecEval [20.959848710829878]
Large language models (LLMs) have brought significant advancements to code generation and code repair. However, their training using unsanitized data from open-source repositories, like GitHub, raises the risk of inadvertently propagating security vulnerabilities. We aim to present a comprehensive study aimed at precisely evaluating and enhancing the security aspects of code LLMs.
arXiv Detail & Related papers (2024-07-02T16:13:21Z)
VersiCode: Towards Version-controllable Code Generation [58.82709231906735]
Large Language Models (LLMs) have made tremendous strides in code generation, but existing research fails to account for the dynamic nature of software development. We propose two novel tasks aimed at bridging this gap: version-specific code completion (VSCC) and version-aware code migration (VACM) We conduct an extensive evaluation on VersiCode, which reveals that version-controllable code generation is indeed a significant challenge.
arXiv Detail & Related papers (2024-06-11T16:15:06Z)
CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion [117.178835165855]
This paper introduces CodeAttack, a framework that transforms natural language inputs into code inputs. Our studies reveal a new and universal safety vulnerability of these models against code input. We find that a larger distribution gap between CodeAttack and natural language leads to weaker safety generalization.
arXiv Detail & Related papers (2024-03-12T17:55:38Z)
LLM-Powered Code Vulnerability Repair with Reinforcement Learning and Semantic Reward [3.729516018513228]
We introduce a multipurpose code vulnerability analysis system textttSecRepair, powered by a large language model, CodeGen2. Inspired by how humans fix code issues, we propose an instruction-based dataset suitable for vulnerability analysis with LLMs. We identify zero-day and N-day vulnerabilities in 6 Open Source IoT Operating Systems on GitHub.
arXiv Detail & Related papers (2024-01-07T02:46:39Z)
Enhancing Large Language Models for Secure Code Generation: A Dataset-driven Study on Vulnerability Mitigation [24.668682498171776]
Large language models (LLMs) have brought significant advancements to code generation, benefiting both novice and experienced developers. However, their training using unsanitized data from open-source repositories, like GitHub, introduces the risk of inadvertently propagating security vulnerabilities. This paper presents a comprehensive study focused on evaluating and enhancing code LLMs from a software security perspective.
arXiv Detail & Related papers (2023-10-25T00:32:56Z)
Security Weaknesses of Copilot Generated Code in GitHub [8.364612094301071]
We analyze code snippets generated by GitHub Copilot from GitHub projects. Our analysis identified 452 snippets generated by Copilot, revealing a high likelihood of security issues. It also shows that practitioners should cultivate corresponding security awareness and skills.
arXiv Detail & Related papers (2023-10-03T14:01:28Z)
Generation Probabilities Are Not Enough: Uncertainty Highlighting in AI Code Completions [54.55334589363247]
We study whether conveying information about uncertainty enables programmers to more quickly and accurately produce code. We find that highlighting tokens with the highest predicted likelihood of being edited leads to faster task completion and more targeted edits.
arXiv Detail & Related papers (2023-02-14T18:43:34Z)
CodeLMSec Benchmark: Systematically Evaluating and Finding Security Vulnerabilities in Black-Box Code Language Models [58.27254444280376]
Large language models (LLMs) for automatic code generation have achieved breakthroughs in several programming tasks. Training data for these models is usually collected from the Internet (e.g., from open-source repositories) and is likely to contain faults and security vulnerabilities. This unsanitized training data can cause the language models to learn these vulnerabilities and propagate them during the code generation procedure.
arXiv Detail & Related papers (2023-02-08T11:54:07Z)
Is GitHub's Copilot as Bad as Humans at Introducing Vulnerabilities in Code? [12.350130201627186]
We perform a comparative empirical analysis of Copilot-generated code from a security perspective. We investigate whether Copilot is just as likely to introduce the same software vulnerabilities as human developers.
arXiv Detail & Related papers (2022-04-10T18:32:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.