Related papers: Can We Trust Large Language Models Generated Code? A Framework for In-Context Learning, Security Patterns, and Code Evaluations Across Diverse LLMs

Can We Trust Large Language Models Generated Code? A Framework for In-Context Learning, Security Patterns, and Code Evaluations Across Diverse LLMs

URL: http://arxiv.org/abs/2406.12513v1
Date: Tue, 18 Jun 2024 11:29:34 GMT
Title: Can We Trust Large Language Models Generated Code? A Framework for In-Context Learning, Security Patterns, and Code Evaluations Across Diverse LLMs
Authors: Ahmad Mohsin, Helge Janicke, Adrian Wood, Iqbal H. Sarker, Leandros Maglaras, Naeem Janjua,
Abstract summary: Large Language Models (LLMs) have revolutionized automated code generation in software engineering. However, concerns have arisen regarding the security and quality of the generated code. Our research aims to tackle these issues by introducing a framework for secure behavioral learning of LLMs.
Score: 2.7138982369416866
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) such as ChatGPT and GitHub Copilot have revolutionized automated code generation in software engineering. However, as these models are increasingly utilized for software development, concerns have arisen regarding the security and quality of the generated code. These concerns stem from LLMs being primarily trained on publicly available code repositories and internet-based textual data, which may contain insecure code. This presents a significant risk of perpetuating vulnerabilities in the generated code, creating potential attack vectors for exploitation by malicious actors. Our research aims to tackle these issues by introducing a framework for secure behavioral learning of LLMs through In-Content Learning (ICL) patterns during the code generation process, followed by rigorous security evaluations. To achieve this, we have selected four diverse LLMs for experimentation. We have evaluated these coding LLMs across three programming languages and identified security vulnerabilities and code smells. The code is generated through ICL with curated problem sets and undergoes rigorous security testing to evaluate the overall quality and trustworthiness of the generated code. Our research indicates that ICL-driven one-shot and few-shot learning patterns can enhance code security, reducing vulnerabilities in various programming scenarios. Developers and researchers should know that LLMs have a limited understanding of security principles. This may lead to security breaches when the generated code is deployed in production systems. Our research highlights LLMs are a potential source of new vulnerabilities to the software supply chain. It is important to consider this when using LLMs for code generation. This research article offers insights into improving LLM security and encourages proactive use of LLMs for code generation to ensure software system safety.

Related papers

Large Language Model Unlearning for Source Code [65.42425213605114]
PROD is a novel unlearning approach that enables LLMs to forget undesired code content while preserving their code generation capabilities.<n>Our evaluation demonstrates that PROD achieves superior balance between forget quality and model utility compared to existing unlearning approaches.
arXiv Detail & Related papers (2025-06-20T16:27:59Z)
The Hidden Risks of LLM-Generated Web Application Code: A Security-Centric Evaluation of Code Generation Capabilities in Large Language Models [0.769672852567215]
This paper uses predefined security parameters to evaluate the security compliance of LLM-generated code across multiple models. The analysis reveals critical vulnerabilities in authentication mechanisms, session management, input validation and HTTP security headers. Our findings underscore that human expertise is crucial to ensure secure software deployment or review of LLM-generated code.
arXiv Detail & Related papers (2025-04-29T10:23:11Z)
Security and Quality in LLM-Generated Code: A Multi-Language, Multi-Model Analysis [10.268191178804168]
This paper analyzes the security of code generated by Large Language Models (LLMs) across different programming languages. Our research shows that while LLMs can automate code creation, their security effectiveness varies by language.
arXiv Detail & Related papers (2025-02-03T22:03:13Z)
Internal Activation as the Polar Star for Steering Unsafe LLM Behavior [50.463399903987245]
We introduce SafeSwitch, a framework that dynamically regulates unsafe outputs by monitoring and utilizing the model's internal states. Our empirical results show that SafeSwitch reduces harmful outputs by over 80% on safety benchmarks while maintaining strong utility.
arXiv Detail & Related papers (2025-02-03T04:23:33Z)
Helping LLMs Improve Code Generation Using Feedback from Testing and Static Analysis [3.892345568697058]
Large Language Models (LLMs) are one of the most promising developments in the field of artificial intelligence. Developers routinely ask LLMs to generate code snippets, increasing productivity but also introducing ownership, privacy, correctness, and security issues. Previous work highlighted how code generated by commercial LLMs is often not safe, containing vulnerabilities, bugs, and code smells.
arXiv Detail & Related papers (2024-12-19T13:34:14Z)
RTL-Breaker: Assessing the Security of LLMs against Backdoor Attacks on HDL Code Generation [17.53405545690049]
Large language models (LLMs) have demonstrated remarkable potential with code generation/completion tasks for hardware design. LLMs are susceptible to so-called data poisoning or backdoor attacks. Here, attackers inject malicious code for the training data, which can be carried over into the HDL code generated by LLMs.
arXiv Detail & Related papers (2024-11-26T16:31:18Z)
HexaCoder: Secure Code Generation via Oracle-Guided Synthetic Training Data [60.75578581719921]
Large language models (LLMs) have shown great potential for automatic code generation. Recent studies highlight that many LLM-generated code contains serious security vulnerabilities. We introduce HexaCoder, a novel approach to enhance the ability of LLMs to generate secure codes.
arXiv Detail & Related papers (2024-09-10T12:01:43Z)
An Exploratory Study on Fine-Tuning Large Language Models for Secure Code Generation [17.69409515806874]
We present an exploratory study on whether fine-tuning pre-trained LLMs on datasets of vulnerability-fixing commits can promote secure code generation. We crawled a fine-tuning dataset for secure code generation by collecting code fixes of confirmed vulnerabilities from open-source repositories. Our exploration reveals that fine-tuning LLMs can improve secure code generation by 6.4% in C language and 5.4% in C++ language.
arXiv Detail & Related papers (2024-08-17T02:51:27Z)
Is Your AI-Generated Code Really Safe? Evaluating Large Language Models on Secure Code Generation with CodeSecEval [20.959848710829878]
Large language models (LLMs) have brought significant advancements to code generation and code repair. However, their training using unsanitized data from open-source repositories, like GitHub, raises the risk of inadvertently propagating security vulnerabilities. We aim to present a comprehensive study aimed at precisely evaluating and enhancing the security aspects of code LLMs.
arXiv Detail & Related papers (2024-07-02T16:13:21Z)
CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion [117.178835165855]
This paper introduces CodeAttack, a framework that transforms natural language inputs into code inputs. Our studies reveal a new and universal safety vulnerability of these models against code input. We find that a larger distribution gap between CodeAttack and natural language leads to weaker safety generalization.
arXiv Detail & Related papers (2024-03-12T17:55:38Z)
Assured LLM-Based Software Engineering [51.003878077888686]
This paper is an outline of the content of the keynote by Mark Harman at the International Workshop on Interpretability, Robustness, and Benchmarking in Neural Software Engineering, Monday 15th April 2024, Lisbon, Portugal.
arXiv Detail & Related papers (2024-02-06T20:38:46Z)
If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents [81.60906807941188]
Large language models (LLMs) are trained on a combination of natural language and formal language (code) Code translates high-level goals into executable steps, featuring standard syntax, logical consistency, abstraction, and modularity.
arXiv Detail & Related papers (2024-01-01T16:51:20Z)
Can LLMs Patch Security Issues? [1.3299507495084417]
Large Language Models (LLMs) have shown impressive proficiency in code generation. LLMs share a weakness with their human counterparts: producing code that inadvertently has security vulnerabilities. We propose Feedback-Driven Security Patching (FDSP), where LLMs automatically refine generated, vulnerable code.
arXiv Detail & Related papers (2023-11-13T08:54:37Z)
SALLM: Security Assessment of Generated Code [0.5137309756089941]
This paper describes SALLM, a framework to benchmark Large Language Models' abilities to generate secure code systematically. The framework has three major components: a novel dataset of security-centric Python prompts, assessment techniques to evaluate the generated code, and novel metrics to evaluate the models' performance from the perspective of secure code generation.
arXiv Detail & Related papers (2023-11-01T22:46:31Z)
Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs [59.596335292426105]
This paper collects the first open-source dataset to evaluate safeguards in large language models. We train several BERT-like classifiers to achieve results comparable with GPT-4 on automatic safety evaluation.
arXiv Detail & Related papers (2023-08-25T14:02:12Z)
CodeLMSec Benchmark: Systematically Evaluating and Finding Security Vulnerabilities in Black-Box Code Language Models [58.27254444280376]
Large language models (LLMs) for automatic code generation have achieved breakthroughs in several programming tasks. Training data for these models is usually collected from the Internet (e.g., from open-source repositories) and is likely to contain faults and security vulnerabilities. This unsanitized training data can cause the language models to learn these vulnerabilities and propagate them during the code generation procedure.
arXiv Detail & Related papers (2023-02-08T11:54:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.