When "Correct" Is Not Safe: Can We Trust Functionally Correct Patches Generated by Code Agents?
- URL: http://arxiv.org/abs/2510.17862v1
- Date: Wed, 15 Oct 2025 17:16:36 GMT
- Title: When "Correct" Is Not Safe: Can We Trust Functionally Correct Patches Generated by Code Agents?
- Authors: Yibo Peng, James Song, Lei Li, Xinyu Yang, Mihai Christodorescu, Ravi Mangal, Corina Pasareanu, Haizhong Zheng, Beidi Chen,
- Abstract summary: Code agents are increasingly trusted to autonomously fix bugs on platforms such as GitHub.<n>In this paper, we reveal a novel type of threat to real-world code agents: Functionally Correct yet Vulnerable (FCV) patches.<n>We show that SOTA LLMs (e.g., ChatGPT and Claude) and agent scaffolds (e.g., SWE-agent and OpenHands) are all vulnerable to this FCV threat.
- Score: 32.85968956009615
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Code agents are increasingly trusted to autonomously fix bugs on platforms such as GitHub, yet their security evaluation focuses almost exclusively on functional correctness. In this paper, we reveal a novel type of threat to real-world code agents: Functionally Correct yet Vulnerable (FCV) patches, which pass all test cases but contain vulnerable code. With our proposed FCV-Attack, which can be deliberately crafted by malicious attackers or implicitly introduced by benign developers, we show that SOTA LLMs (e.g., ChatGPT and Claude) and agent scaffolds (e.g., SWE-agent and OpenHands) are all vulnerable to this FCV threat; across 12 agent-model combinations on SWE-Bench, the attack only requires black-box access and a single query to the code agent to perform the attack. For example, for CWE-538 (information exposure vulnerability), the FCV-Attack attains an attack success rate of $40.7\%$ on GPT-5 Mini + OpenHands. Our results reveal an important security threat overlooked by current evaluation paradigms and urge the development of security-aware defenses for code agents.
Related papers
- AgenticSCR: An Autonomous Agentic Secure Code Review for Immature Vulnerabilities Detection [8.909533914802669]
We introduce AgenticSCR, an agentic AI for secure code review for detecting immature vulnerabilities during the pre-commit stage.<n>We empirically evaluate how accurate is AgenticSCR for localizing, detecting, and explaining immature vulnerabilities.
arXiv Detail & Related papers (2026-01-27T03:10:12Z) - Malice in Agentland: Down the Rabbit Hole of Backdoors in the AI Supply Chain [82.98626829232899]
Fine-tuning AI agents on data from their own interactions introduces a critical security vulnerability within the AI supply chain.<n>We show that adversaries can easily poison the data collection pipeline to embed hard-to-detect backdoors.
arXiv Detail & Related papers (2025-10-03T12:47:21Z) - RedCodeAgent: Automatic Red-teaming Agent against Diverse Code Agents [70.24175620901538]
Code agents have gained widespread adoption due to their strong code generation capabilities and integration with code interpreters.<n>Current static safety benchmarks and red-teaming tools are inadequate for identifying emerging real-world risky scenarios.<n>We propose RedCodeAgent, the first automated red-teaming agent designed to systematically uncover vulnerabilities in diverse code agents.
arXiv Detail & Related papers (2025-10-02T22:59:06Z) - Cuckoo Attack: Stealthy and Persistent Attacks Against AI-IDE [64.47951172662745]
Cuckoo Attack is a novel attack that achieves stealthy and persistent command execution by embedding malicious payloads into configuration files.<n>We formalize our attack paradigm into two stages, including initial infection and persistence.<n>We contribute seven actionable checkpoints for vendors to evaluate their product security.
arXiv Detail & Related papers (2025-09-19T04:10:52Z) - Poison Once, Control Anywhere: Clean-Text Visual Backdoors in VLM-based Mobile Agents [54.35629963816521]
This work introduces VIBMA, the first clean-text backdoor attack targeting VLM-based mobile agents.<n>The attack injects malicious behaviors into the model by modifying only the visual input.<n>We show that our attack achieves high success rates while preserving clean-task behavior.
arXiv Detail & Related papers (2025-06-16T08:09:32Z) - VPI-Bench: Visual Prompt Injection Attacks for Computer-Use Agents [74.6761188527948]
Computer-Use Agents (CUAs) with full system access pose significant security and privacy risks.<n>We investigate Visual Prompt Injection (VPI) attacks, where malicious instructions are visually embedded within rendered user interfaces.<n>Our empirical study shows that current CUAs and BUAs can be deceived at rates of up to 51% and 100%, respectively, on certain platforms.
arXiv Detail & Related papers (2025-06-03T05:21:50Z) - XOXO: Stealthy Cross-Origin Context Poisoning Attacks against AI Coding Assistants [11.9972177330089]
We propose a novel attack, Poisoning (XOXO) as it relies on adversarial code modifications that are semantically equivalent.<n>We achieve 7572% success rate across eleven models, including GPTnet v2 used in our attack.
arXiv Detail & Related papers (2025-03-18T14:20:54Z) - An LLM-Assisted Easy-to-Trigger Backdoor Attack on Code Completion Models: Injecting Disguised Vulnerabilities against Strong Detection [17.948513691133037]
We introduce CodeBreaker, a pioneering LLM-assisted backdoor attack framework on code completion models.
By integrating malicious payloads directly into the source code with minimal transformation, CodeBreaker challenges current security measures.
arXiv Detail & Related papers (2024-06-10T22:10:05Z) - Double Backdoored: Converting Code Large Language Model Backdoors to Traditional Malware via Adversarial Instruction Tuning Attacks [15.531860128240385]
This work investigates novel techniques for transitioning backdoors from the AI/ML domain to traditional computer malware.<n>We present MalInstructCoder, a framework designed to assess the cybersecurity vulnerabilities of instruction-tuned Code LLMs.<n>We conduct a comprehensive investigation into the exploitability of the code-specific instruction tuning process involving three state-of-the-art Code LLMs.
arXiv Detail & Related papers (2024-04-29T10:14:58Z) - LeapFrog: The Rowhammer Instruction Skip Attack [5.285478567449658]
We present a new type of Rowhammer gadget, called a LeapFrog gadget, which allows an adversary to subvert code execution.<n>The LeapFrog gadget manifests when the victim code stores the Program Counter (PC) value in the user or kernel stack.<n>This research also presents a systematic process to identify LeapFrog gadgets.
arXiv Detail & Related papers (2024-04-11T16:10:16Z) - Detecting Security Patches via Behavioral Data in Code Repositories [11.052678122289871]
We show a system to automatically identify security patches using only the developer behavior in the Git repository.
We showed we can reveal concealed security patches with an accuracy of 88.3% and F1 Score of 89.8%.
arXiv Detail & Related papers (2023-02-04T06:43:07Z) - Stealthy Backdoor Attack for Code Models [19.272856932095966]
Existing backdoor attacks on code models use unstealthy and easy-to-detect triggers.
This paper aims to investigate the vulnerability of code models with stealthy backdoor attacks.
We find that around 85% of adaptive triggers in AFRAIDOOR bypass the detection in the defense process.
arXiv Detail & Related papers (2023-01-06T13:15:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.