Security Vulnerabilities in AI-Generated Code: A Large-Scale Analysis of Public GitHub Repositories
- URL: http://arxiv.org/abs/2510.26103v1
- Date: Thu, 30 Oct 2025 03:29:06 GMT
- Title: Security Vulnerabilities in AI-Generated Code: A Large-Scale Analysis of Public GitHub Repositories
- Authors: Maximilian Schreiber, Pascal Tippe,
- Abstract summary: This paper presents a comprehensive empirical analysis of security vulnerabilities in AI-generated code across public GitHub repositories.<n>We collected and analyzed 7,703 files explicitly attributed to four major AI tools: ChatGPT, GitHub Copilot, Amazon CodeWhisperer, and Tabnine.<n>Using CodeQL static analysis, we identified 4,241 Common Weaknession (CWE) instances across 77 distinct vulnerability types.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents a comprehensive empirical analysis of security vulnerabilities in AI-generated code across public GitHub repositories. We collected and analyzed 7,703 files explicitly attributed to four major AI tools: ChatGPT (91.52\%), GitHub Copilot (7.50\%), Amazon CodeWhisperer (0.52\%), and Tabnine (0.46\%). Using CodeQL static analysis, we identified 4,241 Common Weakness Enumeration (CWE) instances across 77 distinct vulnerability types. Our findings reveal that while 87.9\% of AI-generated code does not contain identifiable CWE-mapped vulnerabilities, significant patterns emerge regarding language-specific vulnerabilities and tool performance. Python consistently exhibited higher vulnerability rates (16.18\%-18.50\%) compared to JavaScript (8.66\%-8.99\%) and TypeScript (2.50\%-7.14\%) across all tools. We observed notable differences in security performance, with GitHub Copilot achieving better security density for Python (1,739 LOC per CWE) and TypeScript, while ChatGPT performed better for JavaScript. Additionally, we discovered widespread use of AI tools for documentation generation (39\% of collected files), an understudied application with implications for software maintainability. These findings extend previous work with a significantly larger dataset and provide valuable insights for developing language-specific and context-aware security practices for the responsible integration of AI-generated code into software development workflows.
Related papers
- CVE-Factory: Scaling Expert-Level Agentic Tasks for Code Security Vulnerability [50.57373283154859]
We present CVE-Factory, the first multiagent framework to achieve expert-level quality in automatically transforming vulnerability tasks.<n>It is also evaluated on the latest realistic vulnerabilities and achieves a 66.2% verified success.<n>We synthesize over 1,000 executable training environments, the first large-scale scaling of agentic tasks in code security.
arXiv Detail & Related papers (2026-02-03T02:27:16Z) - SAGA: Detecting Security Vulnerabilities Using Static Aspect Analysis [5.971445533193919]
SAGA is an approach to detect and locate vulnerabilities in Python source code in a versatile way.<n>We have evaluated SAGA on a dataset of 108 vulnerabilities, obtaining 100% sensitivity and 99.15% specificity, with only one false positive, while outperforming four common security analysis tools.
arXiv Detail & Related papers (2026-01-21T16:26:26Z) - Trace: Securing Smart Contract Repository Against Access Control Vulnerability [58.02691083789239]
GitHub hosts numerous smart contract repositories containing source code, documentation, and configuration files.<n>Third-party developers often reference, reuse, or fork code from these repositories during custom development.<n>Existing tools for detecting smart contract vulnerabilities are limited in their ability to handle complex repositories.
arXiv Detail & Related papers (2025-10-22T05:18:28Z) - Eradicating the Unseen: Detecting, Exploiting, and Remediating a Path Traversal Vulnerability across GitHub [1.2124551005857036]
Vulnerabilities in open-source software can cause cascading effects in the modern digital ecosystem.<n>We identified 1,756 vulnerable open-source projects, some of which are very influential.<n>We have responsibly disclosed the vulnerability to the maintainers, and 14% of the reported vulnerabilities have been remediated.
arXiv Detail & Related papers (2025-05-26T16:29:21Z) - BountyBench: Dollar Impact of AI Agent Attackers and Defenders on Real-World Cybersecurity Systems [62.17474934536671]
We introduce the first framework to capture offensive and defensive cyber-capabilities in evolving real-world systems.<n>To capture the vulnerability lifecycle, we define three task types: Detect (detecting a new vulnerability), Exploit (exploiting a specific vulnerability), and Patch (patching a specific vulnerability)<n>We evaluate 8 agents: Claude Code, OpenAI Codex CLI with o3-high and o4-mini, and custom agents with o3-high, GPT-4.1, Gemini 2.5 Pro Preview, Claude 3.7 Sonnet Thinking, and DeepSeek-R1.
arXiv Detail & Related papers (2025-05-21T07:44:52Z) - Impact of the Availability of ChatGPT on Software Development: A Synthetic Difference in Differences Estimation using GitHub Data [49.1574468325115]
ChatGPT is an AI tool that enhances software production efficiency.
We estimate ChatGPT's effects on the number of git pushes, repositories, and unique developers per 100,000 people.
These results suggest that AI tools like ChatGPT can substantially boost developer productivity, though further analysis is needed to address potential downsides such as low quality code and privacy concerns.
arXiv Detail & Related papers (2024-06-16T19:11:15Z) - FV8: A Forced Execution JavaScript Engine for Detecting Evasive Techniques [53.288368877654705]
FV8 is a modified V8 JavaScript engine designed to identify evasion techniques in JavaScript code.
It selectively enforces code execution on APIs that conditionally inject dynamic code.
It identifies 1,443 npm packages and 164 (82%) extensions containing at least one type of evasion.
arXiv Detail & Related papers (2024-05-21T19:54:19Z) - Just another copy and paste? Comparing the security vulnerabilities of ChatGPT generated code and StackOverflow answers [4.320393382724067]
This study empirically compares the vulnerabilities of ChatGPT and StackOverflow snippets.
ChatGPT contained 248 vulnerabilities compared to the 302 vulnerabilities found in SO snippets, producing 20% fewer vulnerabilities with a statistically significant difference.
Our findings suggest developers are under-educated on insecure code propagation from both platforms.
arXiv Detail & Related papers (2024-03-22T20:06:41Z) - Security Weaknesses of Copilot-Generated Code in GitHub Projects: An Empirical Study [8.364612094301071]
We analyze code snippets generated by GitHub Copilot and two other AI code generation tools from GitHub projects.<n>Our analysis identified 733 snippets, revealing a high likelihood of security weaknesses, with 29.5% of Python and 24.2% of JavaScript snippets affected.<n>We provide suggestions for mitigating security issues in generated code.
arXiv Detail & Related papers (2023-10-03T14:01:28Z) - Can Large Language Models Find And Fix Vulnerable Software? [0.0]
GPT-4 identified approximately four times the vulnerabilities than its counterparts.
It provided viable fixes for each vulnerability, demonstrating a low rate of false positives.
GPT-4's code corrections led to a 90% reduction in vulnerabilities, requiring only an 11% increase in code lines.
arXiv Detail & Related papers (2023-08-20T19:33:12Z) - VELVET: a noVel Ensemble Learning approach to automatically locate
VulnErable sTatements [62.93814803258067]
This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements in source code.
Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph.
VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively.
arXiv Detail & Related papers (2021-12-20T22:45:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.