Diffusion LLMs are Natural Adversaries for any LLM
- URL: http://arxiv.org/abs/2511.00203v1
- Date: Fri, 31 Oct 2025 19:04:09 GMT
- Title: Diffusion LLMs are Natural Adversaries for any LLM
- Authors: David Lüdke, Tom Wollschläger, Paul Ungermann, Stephan Günnemann, Leo Schwinn,
- Abstract summary: We introduce a novel framework that transforms the resource-intensive (adversarial) prompt optimization problem into an emphefficient, amortized inference task<n>Our core insight is that pretrained, non-autoregressive generative LLMs, can serve as powerful surrogates for prompt search.<n>We find that the generated prompts are low-perplexity, diverse jailbreaks that exhibit strong transferability to a wide range of black-box target models.
- Score: 50.88535293540971
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a novel framework that transforms the resource-intensive (adversarial) prompt optimization problem into an \emph{efficient, amortized inference task}. Our core insight is that pretrained, non-autoregressive generative LLMs, such as Diffusion LLMs, which model the joint distribution over prompt-response pairs, can serve as powerful surrogates for prompt search. This approach enables the direct conditional generation of prompts, effectively replacing costly, per-instance discrete optimization with a small number of parallelizable samples. We provide a probabilistic analysis demonstrating that under mild fidelity assumptions, only a few conditional samples are required to recover high-reward (harmful) prompts. Empirically, we find that the generated prompts are low-perplexity, diverse jailbreaks that exhibit strong transferability to a wide range of black-box target models, including robustly trained and proprietary LLMs. Beyond adversarial prompting, our framework opens new directions for red teaming, automated prompt optimization, and leveraging emerging Flow- and Diffusion-based LLMs.
Related papers
- Prompt Optimization Via Diffusion Language Models [73.9599434962714]
We propose a diffusion-based framework for prompt optimization.<n>Our method enables flexible, span-level prompt updates without requiring access or modifying the downstream language model.<n>We show that moderate diffusion step counts provide the best balance between refinement quality and stability.
arXiv Detail & Related papers (2026-01-30T00:00:54Z) - From Insight to Exploit: Leveraging LLM Collaboration for Adaptive Adversarial Text Generation [3.75886080255807]
We introduce two innovative attack frameworks designed to generate dynamic and adaptive adversarial examples.<n>We produce subtle and natural-looking adversarial inputs that preserve semantic similarity to the original text.<n>Our attacks evolve with the advancements in LLMs and demonstrate strong transferability acrossversa unknown to the attacker.
arXiv Detail & Related papers (2025-11-05T02:27:56Z) - Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reasoning Models? [65.18157595903124]
This work investigates iterative approximate evaluation for arbitrary prompts.<n>It introduces Model Predictive Prompt Selection (MoPPS), a Bayesian risk-predictive framework.<n>MoPPS reliably predicts prompt difficulty and accelerates training with significantly reduced rollouts.
arXiv Detail & Related papers (2025-07-07T03:20:52Z) - VERA: Variational Inference Framework for Jailbreaking Large Language Models [15.03256687264469]
API-only access to state-of-the-art LLMs highlights the need for effective black-box jailbreak methods.<n>We introduce VERA: Variational infErence fRamework for jAilbreaking.
arXiv Detail & Related papers (2025-06-27T22:22:00Z) - Accelerating Diffusion LLMs via Adaptive Parallel Decoding [60.407727995313074]
We introduce adaptive parallel decoding (APD), a novel method that dynamically adjusts the number of tokens sampled in parallel.<n>APD provides markedly higher throughput with minimal quality degradations on downstream benchmarks.
arXiv Detail & Related papers (2025-05-31T06:10:10Z) - Generalists vs. Specialists: Evaluating LLMs on Highly-Constrained Biophysical Sequence Optimization Tasks [37.326754557721586]
Large language models (LLMs) have shown promise in biomolecule optimization problems.<n> specialized solvers like LaMBO-2 offer efficiency and fine-grained control but require more domain expertise.<n>We address this by introducing Ehrlich functions, a synthetic test suite that captures the geometric structure of biophysical sequence optimization problems.
arXiv Detail & Related papers (2024-10-29T17:45:57Z) - SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration [10.970637831760136]
Speculative decoding (SD) has emerged as a widely used paradigm to accelerate LLM inference without compromising quality.<n>We introduce SWIFT, an on-the-fly self-speculative decoding algorithm that adaptively selects intermediate layers of LLMs to skip during inference.<n>Our experiments demonstrate that SWIFT can achieve over a 1.3x-1.6x speedup while preserving the original distribution of the generated text.
arXiv Detail & Related papers (2024-10-09T14:15:30Z) - Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities.
In-Context Learning (ICL) and.
Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting.
LLMs to downstream tasks.
We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z) - Self-Instructed Derived Prompt Generation Meets In-Context Learning: Unlocking New Potential of Black-Box LLMs [30.333277284839053]
Large language models (LLMs) have shown success in generating high-quality responses.
Existing methods to enhance response quality often involve a prompt refinement model.
We introduce a self-instructed in-context learning framework that empowers LLMs to deliver more effective responses.
arXiv Detail & Related papers (2024-09-03T02:42:39Z) - Tiny Refinements Elicit Resilience: Toward Efficient Prefix-Model Against LLM Red-Teaming [37.32997502058661]
This paper introduces the textbfsentinel model as a plug-and-play prefix module designed to reconstruct the input prompt with just a few tokens.
The sentinel model naturally overcomes the textit parameter inefficiency and textitlimited model accessibility for fine-tuning large target models.
Our experiments across text-to-text and text-to-image demonstrate the effectiveness of our approach in mitigating toxic outputs.
arXiv Detail & Related papers (2024-05-21T08:57:44Z) - Are Large Language Models Good Prompt Optimizers? [65.48910201816223]
We conduct a study to uncover the actual mechanism of LLM-based Prompt Optimization.
Our findings reveal that the LLMs struggle to identify the true causes of errors during reflection, tending to be biased by their own prior knowledge.
We introduce a new "Automatic Behavior Optimization" paradigm, which directly optimize the target model's behavior in a more controllable manner.
arXiv Detail & Related papers (2024-02-03T09:48:54Z) - Prompt Highlighter: Interactive Control for Multi-Modal LLMs [50.830448437285355]
This study targets a critical aspect of multi-modal LLMs' (LLMs&VLMs) inference: explicit controllable text generation.
We introduce a novel inference method, Prompt Highlighter, which enables users to highlight specific prompt spans to interactively control the focus during generation.
We find that, during inference, guiding the models with highlighted tokens through the attention weights leads to more desired outputs.
arXiv Detail & Related papers (2023-12-07T13:53:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.