An Investigation into Misuse of Java Security APIs by Large Language Models
- URL: http://arxiv.org/abs/2404.03823v1
- Date: Thu, 4 Apr 2024 22:52:41 GMT
- Title: An Investigation into Misuse of Java Security APIs by Large Language Models
- Authors: Zahra Mousavi, Chadni Islam, Kristen Moore, Alsharif Abuadbba, Muhammad Ali Babar,
- Abstract summary: This paper systematically assesses ChatGPT's trustworthiness in code generation for security API use cases in Java.
Around 70% of the code instances across 30 attempts per task contain security API misuse, with 20 distinct misuse types identified.
For roughly half of the tasks, this rate reaches 100%, indicating that there is a long way to go before developers can rely on ChatGPT to securely implement security API code.
- Score: 9.453671056356837
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The increasing trend of using Large Language Models (LLMs) for code generation raises the question of their capability to generate trustworthy code. While many researchers are exploring the utility of code generation for uncovering software vulnerabilities, one crucial but often overlooked aspect is the security Application Programming Interfaces (APIs). APIs play an integral role in upholding software security, yet effectively integrating security APIs presents substantial challenges. This leads to inadvertent misuse by developers, thereby exposing software to vulnerabilities. To overcome these challenges, developers may seek assistance from LLMs. In this paper, we systematically assess ChatGPT's trustworthiness in code generation for security API use cases in Java. To conduct a thorough evaluation, we compile an extensive collection of 48 programming tasks for 5 widely used security APIs. We employ both automated and manual approaches to effectively detect security API misuse in the code generated by ChatGPT for these tasks. Our findings are concerning: around 70% of the code instances across 30 attempts per task contain security API misuse, with 20 distinct misuse types identified. Moreover, for roughly half of the tasks, this rate reaches 100%, indicating that there is a long way to go before developers can rely on ChatGPT to securely implement security API code.
Related papers
- Is Your AI-Generated Code Really Safe? Evaluating Large Language Models on Secure Code Generation with CodeSecEval [20.959848710829878]
Large language models (LLMs) have brought significant advancements to code generation and code repair.
However, their training using unsanitized data from open-source repositories, like GitHub, raises the risk of inadvertently propagating security vulnerabilities.
We aim to present a comprehensive study aimed at precisely evaluating and enhancing the security aspects of code LLMs.
arXiv Detail & Related papers (2024-07-02T16:13:21Z) - Just another copy and paste? Comparing the security vulnerabilities of ChatGPT generated code and StackOverflow answers [4.320393382724067]
This study empirically compares the vulnerabilities of ChatGPT and StackOverflow snippets.
ChatGPT contained 248 vulnerabilities compared to the 302 vulnerabilities found in SO snippets, producing 20% fewer vulnerabilities with a statistically significant difference.
Our findings suggest developers are under-educated on insecure code propagation from both platforms.
arXiv Detail & Related papers (2024-03-22T20:06:41Z) - Detectors for Safe and Reliable LLMs: Implementations, Uses, and Limitations [76.19419888353586]
Large language models (LLMs) are susceptible to a variety of risks, from non-faithful output to biased and toxic generations.
We present our efforts to create and deploy a library of detectors: compact and easy-to-build classification models that provide labels for various harms.
arXiv Detail & Related papers (2024-03-09T21:07:16Z) - CodeChameleon: Personalized Encryption Framework for Jailbreaking Large
Language Models [49.60006012946767]
We propose CodeChameleon, a novel jailbreak framework based on personalized encryption tactics.
We conduct extensive experiments on 7 Large Language Models, achieving state-of-the-art average Attack Success Rate (ASR)
Remarkably, our method achieves an 86.6% ASR on GPT-4-1106.
arXiv Detail & Related papers (2024-02-26T16:35:59Z) - Finding Vulnerabilities in Mobile Application APIs: A Modular Programmatic Approach [0.0]
Application Programming Interfaces (APIs) are becoming increasingly popular to transfer data in a variety of mobile applications.
These APIs often process sensitive user information through their endpoints, which are potentially exploitable due to developer mis implementation.
This paper created a custom, modular endpoint vulnerability detection tool to analyze information leakage in various mobile Android applications.
arXiv Detail & Related papers (2023-10-22T00:08:51Z) - How well does LLM generate security tests? [8.454827764115631]
Developers often build software on top of third-party libraries (Libs) to improve productivity and software quality.
People refer to such attacks as supply chain attacks, the documented number of which has increased 742% in 2022.
We used ChatGPT-4.0 to generate security tests, and to demonstrate how vulnerable library dependencies facilitate the supply chain attacks to given Apps.
arXiv Detail & Related papers (2023-10-01T16:00:58Z) - Private-Library-Oriented Code Generation with Large Language Models [52.73999698194344]
This paper focuses on utilizing large language models (LLMs) for code generation in private libraries.
We propose a novel framework that emulates the process of programmers writing private code.
We create four private library benchmarks, including TorchDataEval, TorchDataComplexEval, MonkeyEval, and BeatNumEval.
arXiv Detail & Related papers (2023-07-28T07:43:13Z) - A Novel Approach to Identify Security Controls in Source Code [4.598579706242066]
This paper enumerates a comprehensive list of commonly used security controls and creates a dataset for each one of them.
It uses the state-of-the-art NLP technique Bidirectional Representations from Transformers (BERT) and the Tactic Detector from our prior work to show that security controls could be identified with high confidence.
arXiv Detail & Related papers (2023-07-10T21:14:39Z) - Not what you've signed up for: Compromising Real-World LLM-Integrated
Applications with Indirect Prompt Injection [64.67495502772866]
Large Language Models (LLMs) are increasingly being integrated into various applications.
We show how attackers can override original instructions and employed controls using Prompt Injection attacks.
We derive a comprehensive taxonomy from a computer security perspective to systematically investigate impacts and vulnerabilities.
arXiv Detail & Related papers (2023-02-23T17:14:38Z) - CodeLMSec Benchmark: Systematically Evaluating and Finding Security
Vulnerabilities in Black-Box Code Language Models [58.27254444280376]
Large language models (LLMs) for automatic code generation have achieved breakthroughs in several programming tasks.
Training data for these models is usually collected from the Internet (e.g., from open-source repositories) and is likely to contain faults and security vulnerabilities.
This unsanitized training data can cause the language models to learn these vulnerabilities and propagate them during the code generation procedure.
arXiv Detail & Related papers (2023-02-08T11:54:07Z) - Dos and Don'ts of Machine Learning in Computer Security [74.1816306998445]
Despite great potential, machine learning in security is prone to subtle pitfalls that undermine its performance.
We identify common pitfalls in the design, implementation, and evaluation of learning-based security systems.
We propose actionable recommendations to support researchers in avoiding or mitigating the pitfalls where possible.
arXiv Detail & Related papers (2020-10-19T13:09:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.