Related papers: Just another copy and paste? Comparing the security vulnerabilities of ChatGPT generated code and StackOverflow answers

Just another copy and paste? Comparing the security vulnerabilities of ChatGPT generated code and StackOverflow answers

URL: http://arxiv.org/abs/2403.15600v1
Date: Fri, 22 Mar 2024 20:06:41 GMT
Title: Just another copy and paste? Comparing the security vulnerabilities of ChatGPT generated code and StackOverflow answers
Authors: Sivana Hamer, Marcelo d'Amorim, Laurie Williams,
Abstract summary: This study empirically compares the vulnerabilities of ChatGPT and StackOverflow snippets. ChatGPT contained 248 vulnerabilities compared to the 302 vulnerabilities found in SO snippets, producing 20% fewer vulnerabilities with a statistically significant difference. Our findings suggest developers are under-educated on insecure code propagation from both platforms.
Score: 4.320393382724067
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Sonatype's 2023 report found that 97% of developers and security leads integrate generative Artificial Intelligence (AI), particularly Large Language Models (LLMs), into their development process. Concerns about the security implications of this trend have been raised. Developers are now weighing the benefits and risks of LLMs against other relied-upon information sources, such as StackOverflow (SO), requiring empirical data to inform their choice. In this work, our goal is to raise software developers awareness of the security implications when selecting code snippets by empirically comparing the vulnerabilities of ChatGPT and StackOverflow. To achieve this, we used an existing Java dataset from SO with security-related questions and answers. Then, we asked ChatGPT the same SO questions, gathering the generated code for comparison. After curating the dataset, we analyzed the number and types of Common Weakness Enumeration (CWE) vulnerabilities of 108 snippets from each platform using CodeQL. ChatGPT-generated code contained 248 vulnerabilities compared to the 302 vulnerabilities found in SO snippets, producing 20% fewer vulnerabilities with a statistically significant difference. Additionally, ChatGPT generated 19 types of CWE, fewer than the 22 found in SO. Our findings suggest developers are under-educated on insecure code propagation from both platforms, as we found 274 unique vulnerabilities and 25 types of CWE. Any code copied and pasted, created by AI or humans, cannot be trusted blindly, requiring good software engineering practices to reduce risk. Future work can help minimize insecure code propagation from any platform.

Related papers

Decompiling Smart Contracts with a Large Language Model [51.49197239479266]
Despite Etherscan's 78,047,845 smart contracts deployed on (as of May 26, 2025), a mere 767,520 ( 1%) are open source.<n>This opacity necessitates the automated semantic analysis of on-chain smart contract bytecode.<n>We introduce a pioneering decompilation pipeline that transforms bytecode into human-readable and semantically faithful Solidity code.
arXiv Detail & Related papers (2025-06-24T13:42:59Z)
Secure Coding with AI, From Creation to Inspection [0.0]
This paper examines the security of code generated by ChatGPT based on real developer interactions. We analysed 1,586 C, C++, and C# code snippets using static scanners, which detected potential issues in 124 files. ChatGPT successfully detected 18 out of 32 security issues and resolved 17 issues.
arXiv Detail & Related papers (2025-04-29T14:30:14Z)
SOSecure: Safer Code Generation with RAG and StackOverflow Discussions [4.2630881518611226]
Large Language Models (LLMs) are widely used for automated code generation. Their reliance on infrequently updated pretraining data leaves them unaware of newly discovered vulnerabilities and evolving security standards. This paper introduces SOSecure, a Retrieval-Augmented Generation system that leverages the collective security expertise found in SO discussions to improve the security of LLM-generated code.
arXiv Detail & Related papers (2025-03-17T19:03:36Z)
RedCode: Risky Code Execution and Generation Benchmark for Code Agents [50.81206098588923]
RedCode is a benchmark for risky code execution and generation. RedCode-Exec provides challenging prompts that could lead to risky code execution. RedCode-Gen provides 160 prompts with function signatures and docstrings as input to assess whether code agents will follow instructions.
arXiv Detail & Related papers (2024-11-12T13:30:06Z)
HexaCoder: Secure Code Generation via Oracle-Guided Synthetic Training Data [60.75578581719921]
Large language models (LLMs) have shown great potential for automatic code generation. Recent studies highlight that many LLM-generated code contains serious security vulnerabilities. We introduce HexaCoder, a novel approach to enhance the ability of LLMs to generate secure codes.
arXiv Detail & Related papers (2024-09-10T12:01:43Z)
Identifying Smart Contract Security Issues in Code Snippets from Stack Overflow [34.79673982473015]
We introduce SOChecker, a tool to identify potential vulnerabilities in incomplete SO smart contract code snippets. Results show that SOChecker achieves an F1 score of 68.2%, greatly surpassing GPT-3.5 and GPT-4. Our findings underscore the need to improve the security of code snippets from Q&A websites.
arXiv Detail & Related papers (2024-07-18T08:25:16Z)
Is Your AI-Generated Code Really Safe? Evaluating Large Language Models on Secure Code Generation with CodeSecEval [20.959848710829878]
Large language models (LLMs) have brought significant advancements to code generation and code repair. However, their training using unsanitized data from open-source repositories, like GitHub, raises the risk of inadvertently propagating security vulnerabilities. We aim to present a comprehensive study aimed at precisely evaluating and enhancing the security aspects of code LLMs.
arXiv Detail & Related papers (2024-07-02T16:13:21Z)
Impact of the Availability of ChatGPT on Software Development: A Synthetic Difference in Differences Estimation using GitHub Data [49.1574468325115]
ChatGPT is an AI tool that enhances software production efficiency. We estimate ChatGPT's effects on the number of git pushes, repositories, and unique developers per 100,000 people. These results suggest that AI tools like ChatGPT can substantially boost developer productivity, though further analysis is needed to address potential downsides such as low quality code and privacy concerns.
arXiv Detail & Related papers (2024-06-16T19:11:15Z)
An Investigation into Misuse of Java Security APIs by Large Language Models [9.453671056356837]
This paper systematically assesses ChatGPT's trustworthiness in code generation for security API use cases in Java. Around 70% of the code instances across 30 attempts per task contain security API misuse, with 20 distinct misuse types identified. For roughly half of the tasks, this rate reaches 100%, indicating that there is a long way to go before developers can rely on ChatGPT to securely implement security API code.
arXiv Detail & Related papers (2024-04-04T22:52:41Z)
CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion [117.178835165855]
This paper introduces CodeAttack, a framework that transforms natural language inputs into code inputs. Our studies reveal a new and universal safety vulnerability of these models against code input. We find that a larger distribution gap between CodeAttack and natural language leads to weaker safety generalization.
arXiv Detail & Related papers (2024-03-12T17:55:38Z)
Security Weaknesses of Copilot Generated Code in GitHub [8.364612094301071]
We analyze code snippets generated by GitHub Copilot from GitHub projects. Our analysis identified 452 snippets generated by Copilot, revealing a high likelihood of security issues. It also shows that practitioners should cultivate corresponding security awareness and skills.
arXiv Detail & Related papers (2023-10-03T14:01:28Z)
How well does LLM generate security tests? [8.454827764115631]
Developers often build software on top of third-party libraries (Libs) to improve productivity and software quality. People refer to such attacks as supply chain attacks, the documented number of which has increased 742% in 2022. We used ChatGPT-4.0 to generate security tests, and to demonstrate how vulnerable library dependencies facilitate the supply chain attacks to given Apps.
arXiv Detail & Related papers (2023-10-01T16:00:58Z)
CodeLMSec Benchmark: Systematically Evaluating and Finding Security Vulnerabilities in Black-Box Code Language Models [58.27254444280376]
Large language models (LLMs) for automatic code generation have achieved breakthroughs in several programming tasks. Training data for these models is usually collected from the Internet (e.g., from open-source repositories) and is likely to contain faults and security vulnerabilities. This unsanitized training data can cause the language models to learn these vulnerabilities and propagate them during the code generation procedure.
arXiv Detail & Related papers (2023-02-08T11:54:07Z)
VELVET: a noVel Ensemble Learning approach to automatically locate VulnErable sTatements [62.93814803258067]
This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements in source code. Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph. VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively.
arXiv Detail & Related papers (2021-12-20T22:45:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.