Related papers: An Investigation into Misuse of Java Security APIs by Large Language Models

An Investigation into Misuse of Java Security APIs by Large Language Models

URL: http://arxiv.org/abs/2404.03823v1
Date: Thu, 4 Apr 2024 22:52:41 GMT
Title: An Investigation into Misuse of Java Security APIs by Large Language Models
Authors: Zahra Mousavi, Chadni Islam, Kristen Moore, Alsharif Abuadbba, Muhammad Ali Babar,
Abstract summary: This paper systematically assesses ChatGPT's trustworthiness in code generation for security API use cases in Java. Around 70% of the code instances across 30 attempts per task contain security API misuse, with 20 distinct misuse types identified. For roughly half of the tasks, this rate reaches 100%, indicating that there is a long way to go before developers can rely on ChatGPT to securely implement security API code.
Score: 9.453671056356837
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The increasing trend of using Large Language Models (LLMs) for code generation raises the question of their capability to generate trustworthy code. While many researchers are exploring the utility of code generation for uncovering software vulnerabilities, one crucial but often overlooked aspect is the security Application Programming Interfaces (APIs). APIs play an integral role in upholding software security, yet effectively integrating security APIs presents substantial challenges. This leads to inadvertent misuse by developers, thereby exposing software to vulnerabilities. To overcome these challenges, developers may seek assistance from LLMs. In this paper, we systematically assess ChatGPT's trustworthiness in code generation for security API use cases in Java. To conduct a thorough evaluation, we compile an extensive collection of 48 programming tasks for 5 widely used security APIs. We employ both automated and manual approaches to effectively detect security API misuse in the code generated by ChatGPT for these tasks. Our findings are concerning: around 70% of the code instances across 30 attempts per task contain security API misuse, with 20 distinct misuse types identified. Moreover, for roughly half of the tasks, this rate reaches 100%, indicating that there is a long way to go before developers can rely on ChatGPT to securely implement security API code.

Related papers

Decompiling Smart Contracts with a Large Language Model [51.49197239479266]
Despite Etherscan's 78,047,845 smart contracts deployed on (as of May 26, 2025), a mere 767,520 ( 1%) are open source.<n>This opacity necessitates the automated semantic analysis of on-chain smart contract bytecode.<n>We introduce a pioneering decompilation pipeline that transforms bytecode into human-readable and semantically faithful Solidity code.
arXiv Detail & Related papers (2025-06-24T13:42:59Z)
Towards Trustworthy GUI Agents: A Survey [64.6445117343499]
This survey examines the trustworthiness of GUI agents in five critical dimensions. We identify major challenges such as vulnerability to adversarial attacks, cascading failure modes in sequential decision-making. As GUI agents become more widespread, establishing robust safety standards and responsible development practices is essential.
arXiv Detail & Related papers (2025-03-30T13:26:00Z)
Seeker: Towards Exception Safety Code Generation with Intermediate Language Agents Framework [58.36391985790157]
In real world software development, improper or missing exception handling can severely impact the robustness and reliability of code. We explore the use of large language models (LLMs) to improve exception handling in code. We propose Seeker, a multi-agent framework inspired by expert developer strategies for exception handling.
arXiv Detail & Related papers (2024-12-16T12:35:29Z)
ExploraCoder: Advancing code generation for multiple unseen APIs via planning and chained exploration [70.26807758443675]
ExploraCoder is a training-free framework that empowers large language models to invoke unseen APIs in code solution. We show that ExploraCoder significantly improves performance for models lacking prior API knowledge, achieving an absolute increase of 11.24% over niave RAG approaches and 14.07% over pretraining methods in pass@10.
arXiv Detail & Related papers (2024-12-06T19:00:15Z)
Secret Breach Prevention in Software Issue Reports [2.8747015994080285]
This paper presents a novel technique for secret breach detection in software issue reports. We highlight the challenges posed by noise, such as log files, URLs, commit IDs, stack traces, and dummy passwords. We propose an approach combining the strengths of state-of-the-artes with the contextual understanding of language models.
arXiv Detail & Related papers (2024-10-31T06:14:17Z)
Seeker: Enhancing Exception Handling in Code with LLM-based Multi-Agent Approach [54.03528377384397]
In real world software development, improper or missing exception handling can severely impact the robustness and reliability of code. We explore the use of large language models (LLMs) to improve exception handling in code. We propose Seeker, a multi agent framework inspired by expert developer strategies for exception handling.
arXiv Detail & Related papers (2024-10-09T14:45:45Z)
Unintentional Security Flaws in Code: Automated Defense via Root Cause Analysis [2.899501205987888]
We developed an automated vulnerability root cause (RC) toolkit called T5-RCGCN. It combines T5 language model embeddings with a graph convolutional network (GCN) for vulnerability classification and localization. We tested T5-RCGCN with 56 junior developers across three datasets, showing a 28.9% improvement in code security compared to previous methods.
arXiv Detail & Related papers (2024-08-30T18:26:59Z)
Just another copy and paste? Comparing the security vulnerabilities of ChatGPT generated code and StackOverflow answers [4.320393382724067]
This study empirically compares the vulnerabilities of ChatGPT and StackOverflow snippets. ChatGPT contained 248 vulnerabilities compared to the 302 vulnerabilities found in SO snippets, producing 20% fewer vulnerabilities with a statistically significant difference. Our findings suggest developers are under-educated on insecure code propagation from both platforms.
arXiv Detail & Related papers (2024-03-22T20:06:41Z)
CodeChameleon: Personalized Encryption Framework for Jailbreaking Large Language Models [49.60006012946767]
We propose CodeChameleon, a novel jailbreak framework based on personalized encryption tactics. We conduct extensive experiments on 7 Large Language Models, achieving state-of-the-art average Attack Success Rate (ASR) Remarkably, our method achieves an 86.6% ASR on GPT-4-1106.
arXiv Detail & Related papers (2024-02-26T16:35:59Z)
Finding Vulnerabilities in Mobile Application APIs: A Modular Programmatic Approach [0.0]
Application Programming Interfaces (APIs) are becoming increasingly popular to transfer data in a variety of mobile applications. These APIs often process sensitive user information through their endpoints, which are potentially exploitable due to developer mis implementation. This paper created a custom, modular endpoint vulnerability detection tool to analyze information leakage in various mobile Android applications.
arXiv Detail & Related papers (2023-10-22T00:08:51Z)
How well does LLM generate security tests? [8.454827764115631]
Developers often build software on top of third-party libraries (Libs) to improve productivity and software quality. People refer to such attacks as supply chain attacks, the documented number of which has increased 742% in 2022. We used ChatGPT-4.0 to generate security tests, and to demonstrate how vulnerable library dependencies facilitate the supply chain attacks to given Apps.
arXiv Detail & Related papers (2023-10-01T16:00:58Z)
Private-Library-Oriented Code Generation with Large Language Models [52.73999698194344]
This paper focuses on utilizing large language models (LLMs) for code generation in private libraries. We propose a novel framework that emulates the process of programmers writing private code. We create four private library benchmarks, including TorchDataEval, TorchDataComplexEval, MonkeyEval, and BeatNumEval.
arXiv Detail & Related papers (2023-07-28T07:43:13Z)
Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection [64.67495502772866]
Large Language Models (LLMs) are increasingly being integrated into various applications. We show how attackers can override original instructions and employed controls using Prompt Injection attacks. We derive a comprehensive taxonomy from a computer security perspective to systematically investigate impacts and vulnerabilities.
arXiv Detail & Related papers (2023-02-23T17:14:38Z)
CodeLMSec Benchmark: Systematically Evaluating and Finding Security Vulnerabilities in Black-Box Code Language Models [58.27254444280376]
Large language models (LLMs) for automatic code generation have achieved breakthroughs in several programming tasks. Training data for these models is usually collected from the Internet (e.g., from open-source repositories) and is likely to contain faults and security vulnerabilities. This unsanitized training data can cause the language models to learn these vulnerabilities and propagate them during the code generation procedure.
arXiv Detail & Related papers (2023-02-08T11:54:07Z)
Dos and Don'ts of Machine Learning in Computer Security [74.1816306998445]
Despite great potential, machine learning in security is prone to subtle pitfalls that undermine its performance. We identify common pitfalls in the design, implementation, and evaluation of learning-based security systems. We propose actionable recommendations to support researchers in avoiding or mitigating the pitfalls where possible.
arXiv Detail & Related papers (2020-10-19T13:09:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.