Related papers: Secure Coding with AI, From Creation to Inspection

Secure Coding with AI, From Creation to Inspection

URL: http://arxiv.org/abs/2504.20814v1
Date: Tue, 29 Apr 2025 14:30:14 GMT
Title: Secure Coding with AI, From Creation to Inspection
Authors: Vladislav Belozerov, Peter J Barclay, Ashkan Sami,
Abstract summary: This paper examines the security of code generated by ChatGPT based on real developer interactions.<n>We analysed 1,586 C, C++, and C# code snippets using static scanners, which detected potential issues in 124 files.<n>ChatGPT successfully detected 18 out of 32 security issues and resolved 17 issues.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While prior studies have explored security in code generated by ChatGPT and other Large Language Models, they were conducted in controlled experimental settings and did not use code generated or provided from actual developer interactions. This paper not only examines the security of code generated by ChatGPT based on real developer interactions, curated in the DevGPT dataset, but also assesses ChatGPT's capability to find and fix these vulnerabilities. We analysed 1,586 C, C++, and C# code snippets using static scanners, which detected potential issues in 124 files. After manual analysis, we selected 26 files with 32 confirmed vulnerabilities for further investigation. We submitted these files to ChatGPT via the OpenAI API, asking it to detect security issues, identify the corresponding Common Weakness Enumeration numbers, and propose fixes. The responses and modified code were manually reviewed and re-scanned for vulnerabilities. ChatGPT successfully detected 18 out of 32 security issues and resolved 17 issues but failed to recognize or fix the remainder. Interestingly, only 10 vulnerabilities were resulted from the user prompts, while 22 were introduced by ChatGPT itself. We highlight for developers that code generated by ChatGPT is more likely to contain vulnerabilities compared to their own code. Furthermore, at times ChatGPT reports incorrect information with apparent confidence, which may mislead less experienced developers. Our findings confirm previous studies in demonstrating that ChatGPT is not sufficiently reliable for generating secure code nor identifying all vulnerabilities, highlighting the continuing importance of static scanners and manual review.

Related papers

Why Do Developers Engage with ChatGPT in Issue-Tracker? Investigating Usage and Reliance on ChatGPT-Generated Code [4.605779671279481]
We analyzed 1,152 Developer-ChatGPT conversations across 1,012 issues in GitHub. ChatGPT is primarily utilized for ideation, whereas its usage for validation is minimal. ChatGPT-generated code was used as-is to resolve only 5.83% of the issues.
arXiv Detail & Related papers (2024-12-09T18:47:31Z)
Fight Fire with Fire: How Much Can We Trust ChatGPT on Source Code-Related Tasks? [10.389763758883975]
Recent studies proposed utilizing ChatGPT as both a developer and tester for collaborative software development.<n>We conduct a comprehensive empirical investigation to evaluate ChatGPT's self-verification capability in code generation, code completion, and program repair.
arXiv Detail & Related papers (2024-05-21T09:47:33Z)
Just another copy and paste? Comparing the security vulnerabilities of ChatGPT generated code and StackOverflow answers [4.320393382724067]
This study empirically compares the vulnerabilities of ChatGPT and StackOverflow snippets. ChatGPT contained 248 vulnerabilities compared to the 302 vulnerabilities found in SO snippets, producing 20% fewer vulnerabilities with a statistically significant difference. Our findings suggest developers are under-educated on insecure code propagation from both platforms.
arXiv Detail & Related papers (2024-03-22T20:06:41Z)
Exploring ChatGPT's Capabilities on Vulnerability Management [56.4403395100589]
We explore ChatGPT's capabilities on 6 tasks involving the complete vulnerability management process with a large-scale dataset containing 70,346 samples. One notable example is ChatGPT's proficiency in tasks like generating titles for software bug reports. Our findings reveal the difficulties encountered by ChatGPT and shed light on promising future directions.
arXiv Detail & Related papers (2023-11-11T11:01:13Z)
Zero-Shot Detection of Machine-Generated Codes [83.0342513054389]
This work proposes a training-free approach for the detection of LLMs-generated codes. We find that existing training-based or zero-shot text detectors are ineffective in detecting code. Our method exhibits robustness against revision attacks and generalizes well to Java codes.
arXiv Detail & Related papers (2023-10-08T10:08:21Z)
Evaluating the Impact of ChatGPT on Exercises of a Software Security Course [2.3017018980874617]
ChatGPT can identify 20 of the 28 vulnerabilities we inserted in the web application in a white-box setting. ChatGPT makes nine satisfactory penetration testing and fixing recommendations for the ten vulnerabilities we want students to fix.
arXiv Detail & Related papers (2023-09-18T18:53:43Z)
Prompt-Enhanced Software Vulnerability Detection Using ChatGPT [9.35868869848051]
Large language models (LLMs) like GPT have received considerable attention due to their stunning intelligence. This paper launches a study on the performance of software vulnerability detection using ChatGPT with different prompt designs.
arXiv Detail & Related papers (2023-08-24T10:30:33Z)
ChatLog: Carefully Evaluating the Evolution of ChatGPT Across Time [54.18651663847874]
ChatGPT has achieved great success and can be considered to have acquired an infrastructural status. Existing benchmarks encounter two challenges: (1) Disregard for periodical evaluation and (2) Lack of fine-grained features. We construct ChatLog, an ever-updating dataset with large-scale records of diverse long-form ChatGPT responses for 21 NLP benchmarks from March, 2023 to now.
arXiv Detail & Related papers (2023-04-27T11:33:48Z)
In ChatGPT We Trust? Measuring and Characterizing the Reliability of ChatGPT [44.51625917839939]
ChatGPT has attracted more than 100 million users within a short period of time. We perform the first large-scale measurement of ChatGPT's reliability in the generic QA scenario. We find that ChatGPT's reliability varies across different domains, especially underperforming in law and science questions.
arXiv Detail & Related papers (2023-04-18T13:20:45Z)
To ChatGPT, or not to ChatGPT: That is the question! [78.407861566006]
This study provides a comprehensive and contemporary assessment of the most recent techniques in ChatGPT detection. We have curated a benchmark dataset consisting of prompts from ChatGPT and humans, including diverse questions from medical, open Q&A, and finance domains. Our evaluation results demonstrate that none of the existing methods can effectively detect ChatGPT-generated content.
arXiv Detail & Related papers (2023-04-04T03:04:28Z)
Consistency Analysis of ChatGPT [65.268245109828]
This paper investigates the trustworthiness of ChatGPT and GPT-4 regarding logically consistent behaviour. Our findings suggest that while both models appear to show an enhanced language understanding and reasoning ability, they still frequently fall short of generating logically consistent predictions.
arXiv Detail & Related papers (2023-03-11T01:19:01Z)
TextHide: Tackling Data Privacy in Language Understanding Tasks [54.11691303032022]
TextHide mitigates privacy risks without slowing down training or reducing accuracy. It requires all participants to add a simple encryption step to prevent an eavesdropping attacker from recovering private text data. We evaluate TextHide on the GLUE benchmark, and our experiments show that TextHide can effectively defend attacks on shared gradients or representations.
arXiv Detail & Related papers (2020-10-12T22:22:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.