Related papers: XOXO: Stealthy Cross-Origin Context Poisoning Attacks against AI Coding Assistants

XOXO: Stealthy Cross-Origin Context Poisoning Attacks against AI Coding Assistants

URL: http://arxiv.org/abs/2503.14281v1
Date: Tue, 18 Mar 2025 14:20:54 GMT
Title: XOXO: Stealthy Cross-Origin Context Poisoning Attacks against AI Coding Assistants
Authors: Adam Štorek, Mukur Gupta, Noopur Bhatt, Aditya Gupta, Janie Kim, Prashast Srivastava, Suman Jana,
Abstract summary: We propose a novel attack that relies on semantics that are semantically equivalent to traditional analysis techniques.<n>This allows attackers to manipulate code into producing incorrect outputs, including backdoors, while shifting the blame to the victim.
Score: 11.9972177330089
License: http://creativecommons.org/licenses/by/4.0/
Abstract: AI coding assistants are widely used for tasks like code generation, bug detection, and comprehension. These tools now require large and complex contexts, automatically sourced from various origins$\unicode{x2014}$across files, projects, and contributors$\unicode{x2014}$forming part of the prompt fed to underlying LLMs. This automatic context-gathering introduces new vulnerabilities, allowing attackers to subtly poison input to compromise the assistant's outputs, potentially generating vulnerable code, overlooking flaws, or introducing critical errors. We propose a novel attack, Cross-Origin Context Poisoning (XOXO), that is particularly challenging to detect as it relies on adversarial code modifications that are semantically equivalent. Traditional program analysis techniques struggle to identify these correlations since the semantics of the code remain correct, making it appear legitimate. This allows attackers to manipulate code assistants into producing incorrect outputs, including vulnerabilities or backdoors, while shifting the blame to the victim developer or tester. We introduce a novel, task-agnostic black-box attack algorithm GCGS that systematically searches the transformation space using a Cayley Graph, achieving an 83.09% attack success rate on average across five tasks and eleven models, including GPT-4o and Claude 3.5 Sonnet v2 used by many popular AI coding assistants. Furthermore, existing defenses, including adversarial fine-tuning, are ineffective against our attack, underscoring the need for new security measures in LLM-powered coding tools.

Related papers

Fooling the Decoder: An Adversarial Attack on Quantum Error Correction [49.48516314472825]
In this work, we target a basic RL surface code decoder (DeepQ) to create the first adversarial attack on quantum error correction. We demonstrate an attack that reduces the logical qubit lifetime in memory experiments by up to five orders of magnitude. This attack highlights the susceptibility of machine learning-based QEC and underscores the importance of further research into robust QEC methods.
arXiv Detail & Related papers (2025-04-28T10:10:05Z)
TPIA: Towards Target-specific Prompt Injection Attack against Code-oriented Large Language Models [21.206040279980858]
This paper presents a novel attack paradigm against Code LLMs, namely target-specific prompt injection attack (TPIA)<n>TPIA generates non-functional perturbations containing the information of malicious instructions and inserts them into the victim's code context.<n>We show that our TPIA can successfully attack three representative open-source Code LLMs and two mainstream commercial Code LLM-integrated applications.
arXiv Detail & Related papers (2024-07-12T10:59:32Z)
An LLM-Assisted Easy-to-Trigger Backdoor Attack on Code Completion Models: Injecting Disguised Vulnerabilities against Strong Detection [17.948513691133037]
We introduce CodeBreaker, a pioneering LLM-assisted backdoor attack framework on code completion models. By integrating malicious payloads directly into the source code with minimal transformation, CodeBreaker challenges current security measures.
arXiv Detail & Related papers (2024-06-10T22:10:05Z)
Double Backdoored: Converting Code Large Language Model Backdoors to Traditional Malware via Adversarial Instruction Tuning Attacks [15.531860128240385]
This work investigates novel techniques for transitioning backdoors from the AI/ML domain to traditional computer malware.<n>We present MalInstructCoder, a framework designed to assess the cybersecurity vulnerabilities of instruction-tuned Code LLMs.<n>We conduct a comprehensive investigation into the exploitability of the code-specific instruction tuning process involving three state-of-the-art Code LLMs.
arXiv Detail & Related papers (2024-04-29T10:14:58Z)
LeapFrog: The Rowhammer Instruction Skip Attack [5.285478567449658]
We present a new type of Rowhammer gadget, called a LeapFrog gadget, which allows an adversary to subvert code execution. The LeapFrog gadget manifests when the victim code stores the Program Counter (PC) value in the user or kernel stack. This research also presents a systematic process to identify LeapFrog gadgets.
arXiv Detail & Related papers (2024-04-11T16:10:16Z)
Vulnerabilities in AI Code Generators: Exploring Targeted Data Poisoning Attacks [9.386731514208149]
This work investigates the security of AI code generators by devising a targeted data poisoning strategy. We poison the training data by injecting increasing amounts of code containing security vulnerabilities. Our study shows that AI code generators are vulnerable to even a small amount of poison.
arXiv Detail & Related papers (2023-08-04T15:23:30Z)
Universal and Transferable Adversarial Attacks on Aligned Language Models [118.41733208825278]
We propose a simple and effective attack method that causes aligned language models to generate objectionable behaviors. Surprisingly, we find that the adversarial prompts generated by our approach are quite transferable.
arXiv Detail & Related papers (2023-07-27T17:49:12Z)
Can AI-Generated Text be Reliably Detected? [50.95804851595018]
Large Language Models (LLMs) perform impressively well in various applications.<n>The potential for misuse of these models in activities such as plagiarism, generating fake news, and spamming has raised concern about their responsible use.<n>We stress-test the robustness of these AI text detectors in the presence of an attacker.
arXiv Detail & Related papers (2023-03-17T17:53:19Z)
PoisonedEncoder: Poisoning the Unlabeled Pre-training Data in Contrastive Learning [69.70602220716718]
We propose PoisonedEncoder, a data poisoning attack to contrastive learning. In particular, an attacker injects carefully crafted poisoning inputs into the unlabeled pre-training data. We evaluate five defenses against PoisonedEncoder, including one pre-processing, three in-processing, and one post-processing defenses.
arXiv Detail & Related papers (2022-05-13T00:15:44Z)
Zero-Query Transfer Attacks on Context-Aware Object Detectors [95.18656036716972]
Adversarial attacks perturb images such that a deep neural network produces incorrect classification results. A promising approach to defend against adversarial attacks on natural multi-object scenes is to impose a context-consistency check. We present the first approach for generating context-consistent adversarial attacks that can evade the context-consistency check.
arXiv Detail & Related papers (2022-03-29T04:33:06Z)
Adversarial EXEmples: A Survey and Experimental Evaluation of Practical Attacks on Machine Learning for Windows Malware Detection [67.53296659361598]
adversarial EXEmples can bypass machine learning-based detection by perturbing relatively few input bytes. We develop a unifying framework that does not only encompass and generalize previous attacks against machine-learning models, but also includes three novel attacks. These attacks, named Full DOS, Extend and Shift, inject the adversarial payload by respectively manipulating the DOS header, extending it, and shifting the content of the first section.
arXiv Detail & Related papers (2020-08-17T07:16:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.