Related papers: Evaluating the Impact of ChatGPT on Exercises of a Software Security Course

Evaluating the Impact of ChatGPT on Exercises of a Software Security Course

URL: http://arxiv.org/abs/2309.10085v1
Date: Mon, 18 Sep 2023 18:53:43 GMT
Title: Evaluating the Impact of ChatGPT on Exercises of a Software Security Course
Authors: Jingyue Li, Per H{\aa}kon Meland, Jakob Svennevik Notland, Andr\'e Storhaug, and Jostein Hjortland Tysse
Abstract summary: ChatGPT can identify 20 of the 28 vulnerabilities we inserted in the web application in a white-box setting. ChatGPT makes nine satisfactory penetration testing and fixing recommendations for the ten vulnerabilities we want students to fix.
Score: 2.3017018980874617
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Along with the development of large language models (LLMs), e.g., ChatGPT, many existing approaches and tools for software security are changing. It is, therefore, essential to understand how security-aware these models are and how these models impact software security practices and education. In exercises of a software security course at our university, we ask students to identify and fix vulnerabilities we insert in a web application using state-of-the-art tools. After ChatGPT, especially the GPT-4 version of the model, we want to know how the students can possibly use ChatGPT to complete the exercise tasks. We input the vulnerable code to ChatGPT and measure its accuracy in vulnerability identification and fixing. In addition, we investigated whether ChatGPT can provide a proper source of information to support its outputs. Results show that ChatGPT can identify 20 of the 28 vulnerabilities we inserted in the web application in a white-box setting, reported three false positives, and found four extra vulnerabilities beyond the ones we inserted. ChatGPT makes nine satisfactory penetration testing and fixing recommendations for the ten vulnerabilities we want students to fix and can often point to related sources of information.

Related papers

Can Large Language Models Help Students Prove Software Correctness? An Experimental Study with Dafny [79.56218230251953]
Students in computing education increasingly use large language models (LLMs) such as ChatGPT.<n>This paper investigates how students interact with an LLM when solving formal verification exercises in Dafny.
arXiv Detail & Related papers (2025-06-27T16:34:13Z)
Secure Coding with AI, From Creation to Inspection [0.0]
This paper examines the security of code generated by ChatGPT based on real developer interactions. We analysed 1,586 C, C++, and C# code snippets using static scanners, which detected potential issues in 124 files. ChatGPT successfully detected 18 out of 32 security issues and resolved 17 issues.
arXiv Detail & Related papers (2025-04-29T14:30:14Z)
A Qualitative Study on Using ChatGPT for Software Security: Perception vs. Practicality [1.7624347338410744]
ChatGPT is a Large Language Model (LLM) that can perform a variety of tasks with remarkable semantic understanding and accuracy. This study aims to gain an understanding of the potential of ChatGPT as an emerging technology for supporting software security. It was determined that security practitioners view ChatGPT as beneficial for various software security tasks, including vulnerability detection, information retrieval, and penetration testing.
arXiv Detail & Related papers (2024-08-01T10:14:05Z)
Impact of the Availability of ChatGPT on Software Development: A Synthetic Difference in Differences Estimation using GitHub Data [49.1574468325115]
ChatGPT is an AI tool that enhances software production efficiency. We estimate ChatGPT's effects on the number of git pushes, repositories, and unique developers per 100,000 people. These results suggest that AI tools like ChatGPT can substantially boost developer productivity, though further analysis is needed to address potential downsides such as low quality code and privacy concerns.
arXiv Detail & Related papers (2024-06-16T19:11:15Z)
Pros and Cons! Evaluating ChatGPT on Software Vulnerability [0.0]
We evaluate ChatGPT using Big-Vul covering five different common software vulnerability tasks. We found that the existing state-of-the-art methods are generally superior to ChatGPT in software vulnerability detection. ChatGPT exhibits limited vulnerability repair capabilities in both providing and not providing context information.
arXiv Detail & Related papers (2024-04-05T10:08:34Z)
Exploring ChatGPT's Capabilities on Vulnerability Management [56.4403395100589]
We explore ChatGPT's capabilities on 6 tasks involving the complete vulnerability management process with a large-scale dataset containing 70,346 samples. One notable example is ChatGPT's proficiency in tasks like generating titles for software bug reports. Our findings reveal the difficulties encountered by ChatGPT and shed light on promising future directions.
arXiv Detail & Related papers (2023-11-11T11:01:13Z)
ChatGPT for Vulnerability Detection, Classification, and Repair: How Far Are We? [24.61869093475626]
Large language models (LLMs) like ChatGPT exhibited remarkable advancement in a range of software engineering tasks. We compare ChatGPT with state-of-the-art language models designed for software vulnerability purposes. We found that ChatGPT achieves limited performance, trailing behind other language models in vulnerability contexts by a significant margin.
arXiv Detail & Related papers (2023-10-15T12:01:35Z)
When ChatGPT Meets Smart Contract Vulnerability Detection: How Far Are We? [34.61179425241671]
We present an empirical study to investigate the performance of ChatGPT in identifying smart contract vulnerabilities. ChatGPT achieves a high recall rate, but its precision in pinpointing smart contract vulnerabilities is limited. Our research provides insights into the strengths and weaknesses of employing large language models, specifically ChatGPT, for the detection of smart contract vulnerabilities.
arXiv Detail & Related papers (2023-09-11T15:02:44Z)
Prompt-Enhanced Software Vulnerability Detection Using ChatGPT [9.35868869848051]
Large language models (LLMs) like GPT have received considerable attention due to their stunning intelligence. This paper launches a study on the performance of software vulnerability detection using ChatGPT with different prompt designs.
arXiv Detail & Related papers (2023-08-24T10:30:33Z)
In ChatGPT We Trust? Measuring and Characterizing the Reliability of ChatGPT [44.51625917839939]
ChatGPT has attracted more than 100 million users within a short period of time. We perform the first large-scale measurement of ChatGPT's reliability in the generic QA scenario. We find that ChatGPT's reliability varies across different domains, especially underperforming in law and science questions.
arXiv Detail & Related papers (2023-04-18T13:20:45Z)
ChatGPT Beyond English: Towards a Comprehensive Evaluation of Large Language Models in Multilingual Learning [70.57126720079971]
Large language models (LLMs) have emerged as the most important breakthroughs in natural language processing (NLP) This paper evaluates ChatGPT on 7 different tasks, covering 37 diverse languages with high, medium, low, and extremely low resources. Compared to the performance of previous models, our extensive experimental results demonstrate a worse performance of ChatGPT for different NLP tasks and languages.
arXiv Detail & Related papers (2023-04-12T05:08:52Z)
To ChatGPT, or not to ChatGPT: That is the question! [78.407861566006]
This study provides a comprehensive and contemporary assessment of the most recent techniques in ChatGPT detection. We have curated a benchmark dataset consisting of prompts from ChatGPT and humans, including diverse questions from medical, open Q&A, and finance domains. Our evaluation results demonstrate that none of the existing methods can effectively detect ChatGPT-generated content.
arXiv Detail & Related papers (2023-04-04T03:04:28Z)
CodeLMSec Benchmark: Systematically Evaluating and Finding Security Vulnerabilities in Black-Box Code Language Models [58.27254444280376]
Large language models (LLMs) for automatic code generation have achieved breakthroughs in several programming tasks. Training data for these models is usually collected from the Internet (e.g., from open-source repositories) and is likely to contain faults and security vulnerabilities. This unsanitized training data can cause the language models to learn these vulnerabilities and propagate them during the code generation procedure.
arXiv Detail & Related papers (2023-02-08T11:54:07Z)
A Categorical Archive of ChatGPT Failures [47.64219291655723]
ChatGPT, developed by OpenAI, has been trained using massive amounts of data and simulates human conversation. It has garnered significant attention due to its ability to effectively answer a broad range of human inquiries. However, a comprehensive analysis of ChatGPT's failures is lacking, which is the focus of this study.
arXiv Detail & Related papers (2023-02-06T04:21:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.