Related papers: APILOT: Navigating Large Language Models to Generate Secure Code by Sidestepping Outdated API Pitfalls

APILOT: Navigating Large Language Models to Generate Secure Code by Sidestepping Outdated API Pitfalls

URL: http://arxiv.org/abs/2409.16526v1
Date: Wed, 25 Sep 2024 00:37:40 GMT
Title: APILOT: Navigating Large Language Models to Generate Secure Code by Sidestepping Outdated API Pitfalls
Authors: Weiheng Bai, Keyang Xuan, Pengxiang Huang, Qiushi Wu, Jianing Wen, Jingjing Wu, Kangjie Lu,
Abstract summary: APILOT maintains a realtime, quickly updatable dataset of outdated APIs. It uses an augmented generation method to navigate LLMs in generating secure, version-aware code. It can reduce outdated code recommendations by 89.42% on average with limited performance overhead.
Score: 15.865915079829943
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the rapid development of large language models (LLMs), their applications have expanded into diverse fields, such as code assistance. However, the substantial size of LLMs makes their training highly resource- and time-intensive, rendering frequent retraining or updates impractical. Consequently, time-sensitive data can become outdated, potentially misleading LLMs in time-aware tasks. For example, new vulnerabilities are discovered in various programs every day. Without updating their knowledge, LLMs may inadvertently generate code that includes these newly discovered vulnerabilities. Current strategies, such as prompt engineering and fine-tuning, do not effectively address this issue. To address this issue, we propose solution, named APILOT, which maintains a realtime, quickly updatable dataset of outdated APIs. Additionally, APILOT utilizes an augmented generation method that leverages this dataset to navigate LLMs in generating secure, version-aware code. We conducted a comprehensive evaluation to measure the effectiveness of APILOT in reducing the incidence of outdated API recommendations across seven different state-of-the-art LLMs. The evaluation results indicate that APILOT can reduce outdated code recommendations by 89.42% on average with limited performance overhead. Interestingly, while enhancing security, APILOT also improves the usability of the code generated by LLMs, showing an average increase of 27.54% in usability. This underscores APILOT's dual capability to enhance both the safety and practical utility of code suggestions in contemporary software development environments.

Related papers

Inducing Vulnerable Code Generation in LLM Coding Assistants [10.067898047221558]
In this paper, we reveal a real-world threat, named HACKODE, where attackers exploit referenced external information to embed attack sequences. We designed a prototype of the attack, which generates effective attack sequences for potential diverse inputs. On a real-world application, HACKODE achieves 75.92% ASR, demonstrating its real-world impact.
arXiv Detail & Related papers (2025-04-22T13:09:20Z)
Identifying and Mitigating API Misuse in Large Language Models [26.4403427473915]
API misuse in code generated by large language models (LLMs) represents a serious emerging challenge in software development. This paper presents the first comprehensive study of API misuse patterns in LLM-generated code, analyzing both method selection and parameter usage across Python and Java. We propose Dr.Fix, a novel LLM-based automatic program repair approach for API misuse based on the aforementioned taxonomy.
arXiv Detail & Related papers (2025-03-28T18:43:12Z)
Adversarial Reasoning at Jailbreaking Time [49.70772424278124]
We develop an adversarial reasoning approach to automatic jailbreaking via test-time computation. Our approach introduces a new paradigm in understanding LLM vulnerabilities, laying the foundation for the development of more robust and trustworthy AI systems.
arXiv Detail & Related papers (2025-02-03T18:59:01Z)
ProSec: Fortifying Code LLMs with Proactive Security Alignment [14.907702430331803]
Security of code-specific large language models (LLMs) remains under-explored. We propose ProSec, a novel security alignment approach designed to align code LLMs with secure coding practices. Experiments show that models trained with ProSec is 29.2% to 35.5% more secure compared to previous work.
arXiv Detail & Related papers (2024-11-19T22:00:01Z)
HexaCoder: Secure Code Generation via Oracle-Guided Synthetic Training Data [60.75578581719921]
Large language models (LLMs) have shown great potential for automatic code generation. Recent studies highlight that many LLM-generated code contains serious security vulnerabilities. We introduce HexaCoder, a novel approach to enhance the ability of LLMs to generate secure codes.
arXiv Detail & Related papers (2024-09-10T12:01:43Z)
An Exploratory Study on Fine-Tuning Large Language Models for Secure Code Generation [17.69409515806874]
We present an exploratory study on whether fine-tuning pre-trained LLMs on datasets of vulnerability-fixing commits can promote secure code generation. We crawled a fine-tuning dataset for secure code generation by collecting code fixes of confirmed vulnerabilities from open-source repositories. Our exploration reveals that fine-tuning LLMs can improve secure code generation by 6.4% in C language and 5.4% in C++ language.
arXiv Detail & Related papers (2024-08-17T02:51:27Z)
Improving the Ability of Pre-trained Language Model by Imparting Large Language Model's Experience [4.814313782484443]
Large Language Models (LLMs) and pre-trained Language Models (LMs) have achieved impressive success on many software engineering tasks. We use LLMs to generate domain-specific data, thereby improving the performance of pre-trained LMs on the target tasks.
arXiv Detail & Related papers (2024-08-16T06:37:59Z)
A Decoding Acceleration Framework for Industrial Deployable LLM-based Recommender Systems [49.588316022381385]
We propose a Decoding Acceleration Framework for LLM-based Recommendation (dubbed DARE), with Customized Retrieval Pool to improve retrieval efficiency and Relaxed Verification to increase the acceptance rate of draft tokens. DARE has been deployed to online advertising scenarios within a large-scale commercial environment, achieving a 3.45x speedup while maintaining the downstream performance.
arXiv Detail & Related papers (2024-08-11T02:31:13Z)
Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs [60.32717556756674]
This paper introduces a systematic evaluation framework to assess Large Language Models in detecting cryptographic misuses. Our in-depth analysis of 11,940 LLM-generated reports highlights that the inherent instabilities in LLMs can lead to over half of the reports being false positives. The optimized approach achieves a remarkable detection rate of nearly 90%, surpassing traditional methods and uncovering previously unknown misuses in established benchmarks.
arXiv Detail & Related papers (2024-07-23T15:31:26Z)
VersiCode: Towards Version-controllable Code Generation [58.82709231906735]
Large Language Models (LLMs) have made tremendous strides in code generation, but existing research fails to account for the dynamic nature of software development. We propose two novel tasks aimed at bridging this gap: version-specific code completion (VSCC) and version-aware code migration (VACM) We conduct an extensive evaluation on VersiCode, which reveals that version-controllable code generation is indeed a significant challenge.
arXiv Detail & Related papers (2024-06-11T16:15:06Z)
LM-Polygraph: Uncertainty Estimation for Language Models [71.21409522341482]
Uncertainty estimation (UE) methods are one path to safer, more responsible, and more effective use of large language models (LLMs) We introduce LM-Polygraph, a framework with implementations of a battery of state-of-the-art UE methods for LLMs in text generation tasks, with unified program interfaces in Python. It introduces an extendable benchmark for consistent evaluation of UE techniques by researchers, and a demo web application that enriches the standard chat dialog with confidence scores.
arXiv Detail & Related papers (2023-11-13T15:08:59Z)
SALLM: Security Assessment of Generated Code [0.5137309756089941]
This paper describes SALLM, a framework to benchmark Large Language Models' abilities to generate secure code systematically. The framework has three major components: a novel dataset of security-centric Python prompts, assessment techniques to evaluate the generated code, and novel metrics to evaluate the models' performance from the perspective of secure code generation.
arXiv Detail & Related papers (2023-11-01T22:46:31Z)
Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection [64.67495502772866]
Large Language Models (LLMs) are increasingly being integrated into various applications. We show how attackers can override original instructions and employed controls using Prompt Injection attacks. We derive a comprehensive taxonomy from a computer security perspective to systematically investigate impacts and vulnerabilities.
arXiv Detail & Related papers (2023-02-23T17:14:38Z)
Automatically Recommend Code Updates: Are We There Yet? [14.997510035210842]
We present the first evaluation of state-of-the-art CodeLMs for automatically recommending code updates. Our results reveal that while CodeLMs perform well in settings that ignore temporal information, they struggle in more realistic time-wise scenarios. Our findings highlight the significant gap between the perceived and actual effectiveness of CodeLMs for real-world code update recommendation.
arXiv Detail & Related papers (2022-09-15T05:07:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.