Related papers: Is Your Prompt Poisoning Code? Defect Induction Rates and Security Mitigation Strategies

Is Your Prompt Poisoning Code? Defect Induction Rates and Security Mitigation Strategies

URL: http://arxiv.org/abs/2510.22944v1
Date: Mon, 27 Oct 2025 02:59:17 GMT
Title: Is Your Prompt Poisoning Code? Defect Induction Rates and Security Mitigation Strategies
Authors: Bin Wang, YiLu Zhong, MiDi Wan, WenJie Yu, YuanBing Ouyang, Yenan Huang, Hui Li,
Abstract summary: Large language models (LLMs) have become indispensable for automated code generation, yet the quality and security of their outputs remain a critical concern.<n>We propose an evaluation framework for prompt quality encompassing three key dimensions: goal clarity, information completeness, and logical consistency.<n>Our findings highlight that enhancing the quality of user prompts constitutes a critical and effective strategy for strengthening the security of AI-generated code.
Score: 4.435429537888066
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) have become indispensable for automated code generation, yet the quality and security of their outputs remain a critical concern. Existing studies predominantly concentrate on adversarial attacks or inherent flaws within the models. However, a more prevalent yet underexplored issue concerns how the quality of a benign but poorly formulated prompt affects the security of the generated code. To investigate this, we first propose an evaluation framework for prompt quality encompassing three key dimensions: goal clarity, information completeness, and logical consistency. Based on this framework, we construct and publicly release CWE-BENCH-PYTHON, a large-scale benchmark dataset containing tasks with prompts categorized into four distinct levels of normativity (L0-L3). Extensive experiments on multiple state-of-the-art LLMs reveal a clear correlation: as prompt normativity decreases, the likelihood of generating insecure code consistently and markedly increases. Furthermore, we demonstrate that advanced prompting techniques, such as Chain-of-Thought and Self-Correction, effectively mitigate the security risks introduced by low-quality prompts, substantially improving code safety. Our findings highlight that enhancing the quality of user prompts constitutes a critical and effective strategy for strengthening the security of AI-generated code.

Related papers

Inference-Time Safety For Code LLMs Via Retrieval-Augmented Revision [3.983997834693767]
Large Language Models (LLMs) are increasingly deployed for code generation in high-stakes software development.<n>LLMs cannot readily adapt to newly discovered vulnerabilities or changing security standards without retraining.<n>We present a principled approach to trustworthy code generation by design that operates as an inference-time safety mechanism.
arXiv Detail & Related papers (2026-03-02T06:06:34Z)
The Shadow Self: Intrinsic Value Misalignment in Large Language Model Agents [37.75212140218036]
We formalize the Loss-of-Control risk and identify the previously underexamined Intrinsic Value Misalignment (Intrinsic VM)<n>We then introduce IMPRESS, a scenario-driven framework for systematically assessing this risk.<n>We evaluate Intrinsic VM on 21 state-of-the-art LLM agents and find that it is a common and broadly observed safety risk across models.
arXiv Detail & Related papers (2026-01-24T07:09:50Z)
STaR: Sensitive Trajectory Regulation for Unlearning in Large Reasoning Models [12.133996629992318]
We propose a parameter-free, inference-time unlearning framework that achieves robust privacy protection throughout the reasoning process.<n>Experiments on the R-TOFU benchmark demonstrate that STaR achieves comprehensive and stable unlearning with minimal utility loss.
arXiv Detail & Related papers (2026-01-14T08:35:23Z)
Reasoning over Precedents Alongside Statutes: Case-Augmented Deliberative Alignment for LLM Safety [59.01189713115365]
We evaluate the impact of explicitly specifying extensive safety codes versus demonstrating them through illustrative cases.<n>We find that referencing explicit codes inconsistently improves harmlessness and systematically degrades helpfulness.<n>We propose CADA, a case-augmented deliberative alignment method for LLMs utilizing reinforcement learning on self-generated safety reasoning chains.
arXiv Detail & Related papers (2026-01-12T21:08:46Z)
CARE: Decoding Time Safety Alignment via Rollback and Introspection Intervention [68.95008546581339]
Existing decoding-time interventions, such as Contrastive Decoding, often force a severe trade-off between safety and response quality.<n>We propose CARE, a novel framework for decoding-time safety alignment that integrates three key components.<n>The framework achieves a superior balance of safety, quality, and efficiency, attaining a low harmful response rate and minimal disruption to the user experience.
arXiv Detail & Related papers (2025-09-01T04:50:02Z)
SafeGenBench: A Benchmark Framework for Security Vulnerability Detection in LLM-Generated Code [7.209766132478914]
We introduce SafeGenBench, a benchmark specifically designed to assess the security of LLM-generated code.<n>The dataset encompasses a wide range of common software development scenarios and vulnerability types.<n>Through the empirical evaluation of state-of-the-art LLMs on SafeGenBench, we reveal notable deficiencies in their ability to produce vulnerability-free code.
arXiv Detail & Related papers (2025-06-06T02:48:02Z)
Context Reasoner: Incentivizing Reasoning Capability for Contextualized Privacy and Safety Compliance via Reinforcement Learning [53.92712851223158]
We formulate safety and privacy issues into contextualized compliance problems following the Contextual Integrity (CI) theory.<n>Under the CI framework, we align our model with three critical regulatory standards: EU AI Act, and HIPAA.<n>We employ reinforcement learning (RL) with a rule-based reward to incentivize contextual reasoning capabilities while enhancing compliance with safety and privacy norms.
arXiv Detail & Related papers (2025-05-20T16:40:09Z)
Cannot See the Forest for the Trees: Invoking Heuristics and Biases to Elicit Irrational Choices of LLMs [83.11815479874447]
We propose a novel jailbreak attack framework, inspired by cognitive decomposition and biases in human cognition.<n>We employ cognitive decomposition to reduce the complexity of malicious prompts and relevance bias to reorganize prompts.<n>We also introduce a ranking-based harmfulness evaluation metric that surpasses the traditional binary success-or-failure paradigm.
arXiv Detail & Related papers (2025-05-03T05:28:11Z)
CWEval: Outcome-driven Evaluation on Functionality and Security of LLM Code Generation [20.72188827088484]
Large Language Models (LLMs) have significantly aided developers by generating or assisting in code writing.<n> detecting vulnerabilities in functionally correct code is more challenging, especially for developers with limited security knowledge.<n>We introduce CWEval, a novel outcome-driven evaluation framework designed to enhance the evaluation of secure code generation by LLMs.
arXiv Detail & Related papers (2025-01-14T15:27:01Z)
SeCodePLT: A Unified Platform for Evaluating the Security of Code GenAI [58.29510889419971]
Existing benchmarks for evaluating the security risks and capabilities of code-generating large language models (LLMs) face several key limitations.<n>We introduce a general and scalable benchmark construction framework that begins with manually validated, high-quality seed examples and expands them via targeted mutations.<n>Applying this framework to Python, C/C++, and Java, we build SeCodePLT, a dataset of more than 5.9k samples spanning 44 CWE-based risk categories and three security capabilities.
arXiv Detail & Related papers (2024-10-14T21:17:22Z)
Root Defence Strategies: Ensuring Safety of LLM at the Decoding Level [10.476222570886483]
Large language models (LLMs) have demonstrated immense utility across various industries.<n>As LLMs advance, the risk of harmful outputs increases due to incorrect or malicious instruction prompts.<n>This paper examines the LLMs' capability to recognize harmful outputs, revealing and quantifying their proficiency in assessing the danger of previous tokens.
arXiv Detail & Related papers (2024-10-09T12:09:30Z)
CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion [117.178835165855]
This paper introduces CodeAttack, a framework that transforms natural language inputs into code inputs. Our studies reveal a new and universal safety vulnerability of these models against code input. We find that a larger distribution gap between CodeAttack and natural language leads to weaker safety generalization.
arXiv Detail & Related papers (2024-03-12T17:55:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.