DeceptPrompt: Exploiting LLM-driven Code Generation via Adversarial
Natural Language Instructions
- URL: http://arxiv.org/abs/2312.04730v2
- Date: Tue, 12 Dec 2023 19:42:11 GMT
- Title: DeceptPrompt: Exploiting LLM-driven Code Generation via Adversarial
Natural Language Instructions
- Authors: Fangzhou Wu, Xiaogeng Liu, Chaowei Xiao
- Abstract summary: We introduce DeceptPrompt, an algorithm that can generate adversarial natural language instructions that drive the Code LLMs to generate functionality correct code with vulnerabilities.
When applying the optimized prefix/suffix, the attack success rate (ASR) will improve by average 50% compared with no prefix/suffix applying.
- Score: 27.489622263456983
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the advancement of Large Language Models (LLMs), significant progress
has been made in code generation, enabling LLMs to transform natural language
into programming code. These Code LLMs have been widely accepted by massive
users and organizations. However, a dangerous nature is hidden in the code,
which is the existence of fatal vulnerabilities. While some LLM providers have
attempted to address these issues by aligning with human guidance, these
efforts fall short of making Code LLMs practical and robust. Without a deep
understanding of the performance of the LLMs under the practical worst cases,
it would be concerning to apply them to various real-world applications. In
this paper, we answer the critical issue: Are existing Code LLMs immune to
generating vulnerable code? If not, what is the possible maximum severity of
this issue in practical deployment scenarios? In this paper, we introduce
DeceptPrompt, a novel algorithm that can generate adversarial natural language
instructions that drive the Code LLMs to generate functionality correct code
with vulnerabilities. DeceptPrompt is achieved through a systematic
evolution-based algorithm with a fine grain loss design. The unique advantage
of DeceptPrompt enables us to find natural prefix/suffix with totally benign
and non-directional semantic meaning, meanwhile, having great power in inducing
the Code LLMs to generate vulnerable code. This feature can enable us to
conduct the almost-worstcase red-teaming on these LLMs in a real scenario,
where users are using natural language. Our extensive experiments and analyses
on DeceptPrompt not only validate the effectiveness of our approach but also
shed light on the huge weakness of LLMs in the code generation task. When
applying the optimized prefix/suffix, the attack success rate (ASR) will
improve by average 50% compared with no prefix/suffix applying.
Related papers
- CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion [117.178835165855]
This paper introduces CodeAttack, a framework that transforms natural language inputs into code inputs.
Our studies reveal a new and universal safety vulnerability of these models against code input.
We find that a larger distribution gap between CodeAttack and natural language leads to weaker safety generalization.
arXiv Detail & Related papers (2024-03-12T17:55:38Z) - Assured LLM-Based Software Engineering [51.003878077888686]
This paper is an outline of the content of the keynote by Mark Harman at the International Workshop on Interpretability, Robustness, and Benchmarking in Neural Software Engineering, Monday 15th April 2024, Lisbon, Portugal.
arXiv Detail & Related papers (2024-02-06T20:38:46Z) - Code Prompting Elicits Conditional Reasoning Abilities in Text+Code LLMs [69.99031792995348]
We introduce code prompting, a chain of prompts that transforms a natural language problem into code.
We find that code prompting exhibits a high-performance boost for multiple LLMs.
Our analysis of GPT 3.5 reveals that the code formatting of the input problem is essential for performance improvement.
arXiv Detail & Related papers (2024-01-18T15:32:24Z) - If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code
Empowers Large Language Models to Serve as Intelligent Agents [81.60906807941188]
Large language models (LLMs) are trained on a combination of natural language and formal language (code)
Code translates high-level goals into executable steps, featuring standard syntax, logical consistency, abstraction, and modularity.
arXiv Detail & Related papers (2024-01-01T16:51:20Z) - Transfer Attacks and Defenses for Large Language Models on Coding Tasks [30.065641782962974]
We study the effect of adversarial perturbations on coding tasks with large language models (LLMs)
We propose prompt-based defenses that involve modifying the prompt to include examples of adversarially perturbed code and explicit instructions for reversing adversarial perturbations.
Our experiments show that adversarial examples obtained with a smaller code model are indeed transferable, weakening the LLMs' performance.
arXiv Detail & Related papers (2023-11-22T15:11:35Z) - Test-Case-Driven Programming Understanding in Large Language Models for
Better Code Generation [15.166827643436346]
muFiX is a novel prompting technique to improve the code generation performance of large language models (LLMs)
It first exploits test case analysis to obtain specification understanding and enables a self-improvement process.
muFiX further fixes the specification understanding towards the direction reducing the gap between the provided understanding and the actual understanding.
arXiv Detail & Related papers (2023-09-28T02:58:07Z) - DoLa: Decoding by Contrasting Layers Improves Factuality in Large
Language Models [79.01926242857613]
Large language models (LLMs) are prone to hallucinations, generating content that deviates from facts seen during pretraining.
We propose a simple decoding strategy for reducing hallucinations with pretrained LLMs.
We find that this Decoding by Contrasting Layers (DoLa) approach is able to better surface factual knowledge and reduce the generation of incorrect facts.
arXiv Detail & Related papers (2023-09-07T17:45:31Z) - Red Teaming Language Model Detectors with Language Models [114.36392560711022]
Large language models (LLMs) present significant safety and ethical risks if exploited by malicious users.
Recent works have proposed algorithms to detect LLM-generated text and protect LLMs.
We study two types of attack strategies: 1) replacing certain words in an LLM's output with their synonyms given the context; 2) automatically searching for an instructional prompt to alter the writing style of the generation.
arXiv Detail & Related papers (2023-05-31T10:08:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.