Related papers: Casting a SPELL: Sentence Pairing Exploration for LLM Limitation-breaking

Casting a SPELL: Sentence Pairing Exploration for LLM Limitation-breaking

URL: http://arxiv.org/abs/2512.21236v1
Date: Wed, 24 Dec 2025 15:25:31 GMT
Title: Casting a SPELL: Sentence Pairing Exploration for LLM Limitation-breaking
Authors: Yifan Huang, Xiaojun Jia, Wenbo Guo, Yuqiang Sun, Yihao Huang, Chong Wang, Yang Liu,
Abstract summary: Large language models (LLMs) have revolutionized software development through AI-assisted coding tools.<n>This accessibility extends to malicious actors who may exploit these powerful tools to generate harmful software.<n>We propose SPELL, a comprehensive testing framework specifically designed to evaluate the weakness of security alignment in malicious code generation.
Score: 23.54890959996959
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) have revolutionized software development through AI-assisted coding tools, enabling developers with limited programming expertise to create sophisticated applications. However, this accessibility extends to malicious actors who may exploit these powerful tools to generate harmful software. Existing jailbreaking research primarily focuses on general attack scenarios against LLMs, with limited exploration of malicious code generation as a jailbreak target. To address this gap, we propose SPELL, a comprehensive testing framework specifically designed to evaluate the weakness of security alignment in malicious code generation. Our framework employs a time-division selection strategy that systematically constructs jailbreaking prompts by intelligently combining sentences from a prior knowledge dataset, balancing exploration of novel attack patterns with exploitation of successful techniques. Extensive evaluation across three advanced code models (GPT-4.1, Claude-3.5, and Qwen2.5-Coder) demonstrates SPELL's effectiveness, achieving attack success rates of 83.75%, 19.38%, and 68.12% respectively across eight malicious code categories. The generated prompts successfully produce malicious code in real-world AI development tools such as Cursor, with outputs confirmed as malicious by state-of-the-art detection systems at rates exceeding 73%. These findings reveal significant security gaps in current LLM implementations and provide valuable insights for improving AI safety alignment in code generation applications.

Related papers

Overlooked Safety Vulnerability in LLMs: Malicious Intelligent Optimization Algorithm Request and its Jailbreak [27.520381454182147]
This study investigates the safety of large language models (LLMs) in automated algorithm design.<n>We introduce MalOptBench, a benchmark consisting of 60 malicious optimization algorithm requests, and propose MOBjailbreak.<n>We reveal that most models remain highly susceptible to such attacks, with an average attack success rate of 83.59% and an average harmfulness score of 4.28 out of 5 on original harmful prompts.
arXiv Detail & Related papers (2026-01-01T05:14:32Z)
When AI Takes the Wheel: Security Analysis of Framework-Constrained Program Generation [20.940139710065306]
This work investigates the security properties of framework-constrained programs generated by state-of-the-art LLMs.<n>We focus specifically on Chrome extensions due to their complex security model involving multiple privilege boundaries and isolated components.<n>We used these prompts to instruct nine state-of-the-art LLMs to generate complete Chrome extensions, and then analyzed them for vulnerabilities.
arXiv Detail & Related papers (2025-10-19T13:19:20Z)
A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code [49.009041488527544]
A.S.E is a repository-level evaluation benchmark for assessing the security of AI-generated code.<n>Current large language models (LLMs) still struggle with secure coding.<n>A larger reasoning budget does not necessarily lead to better code generation.
arXiv Detail & Related papers (2025-08-25T15:11:11Z)
ARMOR: Aligning Secure and Safe Large Language Models via Meticulous Reasoning [64.32925552574115]
ARMOR is a large language model that analyzes jailbreak strategies and extracts the core intent.<n> ARMOR achieves state-of-the-art safety performance, with an average harmful rate of 0.002 and an attack success rate of 0.06 against advanced optimization-based jailbreaks.
arXiv Detail & Related papers (2025-07-14T09:05:54Z)
MGC: A Compiler Framework Exploiting Compositional Blindness in Aligned LLMs for Malware Generation [22.29476520010842]
Large language models (LLMs) have democratized software development, reducing the expertise barrier for programming complex applications.<n>This accessibility extends to malicious software development, raising significant security concerns.<n>In this paper, we introduce the Malware Generation Compiler (MGC), a novel framework that leverages this vulnerability through modular decomposition and alignment-evasive generation.
arXiv Detail & Related papers (2025-07-02T18:00:49Z)
LLMs Caught in the Crossfire: Malware Requests and Jailbreak Challenges [70.85114705489222]
We propose MalwareBench, a benchmark dataset containing 3,520 jailbreaking prompts for malicious code-generation.<n>M MalwareBench is based on 320 manually crafted malicious code generation requirements, covering 11 jailbreak methods and 29 code functionality categories.<n>Experiments show that mainstream LLMs exhibit limited ability to reject malicious code-generation requirements, and the combination of multiple jailbreak methods further reduces the model's security capabilities.
arXiv Detail & Related papers (2025-06-09T12:02:39Z)
Towards Action Hijacking of Large Language Model-based Agent [23.13653350521422]
We introduce AI$mathbf2$, a novel attack to manipulate the action plans of LLM-based applications.<n>It first collects action-aware knowledge from the victim application.<n>Based on such knowledge, the attacker can generate misleading input, which can mislead the LLM to generate harmful action plans.
arXiv Detail & Related papers (2024-12-14T12:11:26Z)
Iterative Self-Tuning LLMs for Enhanced Jailbreaking Capabilities [50.980446687774645]
We introduce ADV-LLM, an iterative self-tuning process that crafts adversarial LLMs with enhanced jailbreak ability.<n>Our framework significantly reduces the computational cost of generating adversarial suffixes while achieving nearly 100% ASR on various open-source LLMs.<n>It exhibits strong attack transferability to closed-source models, achieving 99% ASR on GPT-3.5 and 49% ASR on GPT-4, despite being optimized solely on Llama3.
arXiv Detail & Related papers (2024-10-24T06:36:12Z)
How Well Do Large Language Models Serve as End-to-End Secure Code Agents for Python? [42.119319820752324]
We studied GPT-3.5 and GPT-4's capability to identify and repair vulnerabilities in the code generated by four popular LLMs.<n>By manually or automatically reviewing 4,900 pieces of code, our study reveals that large language models lack awareness of scenario-relevant security risks.<n>To address the limitation of a single round of repair, we developed a lightweight tool that prompts LLMs to construct safer source code.
arXiv Detail & Related papers (2024-08-20T02:42:29Z)
ShadowCode: Towards (Automatic) External Prompt Injection Attack against Code LLMs [56.46702494338318]
This paper introduces a new attack paradigm: (automatic) external prompt injection against code-oriented large language models.<n>We propose ShadowCode, a simple yet effective method that automatically generates induced perturbations based on code simulation.<n>We evaluate our method across 13 distinct malicious objectives, generating 31 threat cases spanning three popular programming languages.
arXiv Detail & Related papers (2024-07-12T10:59:32Z)
Codexity: Secure AI-assisted Code Generation [11.114499124198268]
We present Codexity, a security-focused code generation framework integrated with five Large Language Models. Our evaluation in a real-world benchmark with 751 automatically generated vulnerable subjects demonstrates Codexity can prevent 60% of the vulnerabilities being exposed to the software developer.
arXiv Detail & Related papers (2024-05-07T01:11:14Z)
CodeLMSec Benchmark: Systematically Evaluating and Finding Security Vulnerabilities in Black-Box Code Language Models [58.27254444280376]
Large language models (LLMs) for automatic code generation have achieved breakthroughs in several programming tasks. Training data for these models is usually collected from the Internet (e.g., from open-source repositories) and is likely to contain faults and security vulnerabilities. This unsanitized training data can cause the language models to learn these vulnerabilities and propagate them during the code generation procedure.
arXiv Detail & Related papers (2023-02-08T11:54:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.