Related papers: RainbowPlus: Enhancing Adversarial Prompt Generation via Evolutionary Quality-Diversity Search

RainbowPlus: Enhancing Adversarial Prompt Generation via Evolutionary Quality-Diversity Search

URL: http://arxiv.org/abs/2504.15047v1
Date: Mon, 21 Apr 2025 12:04:57 GMT
Title: RainbowPlus: Enhancing Adversarial Prompt Generation via Evolutionary Quality-Diversity Search
Authors: Quy-Anh Dang, Chris Ngo, Truong-Son Hy,
Abstract summary: We propose RainbowPlus, a novel red-teaming framework rooted in evolutionary computation.<n>RainbowPlus enhances adversarial prompt generation through an adaptive quality-diversity search.<n>Our open-source implementation fosters further advancements in safety, offering a scalable tool for vulnerability assessment.
Score: 1.515687944002438
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) exhibit remarkable capabilities but are susceptible to adversarial prompts that exploit vulnerabilities to produce unsafe or biased outputs. Existing red-teaming methods often face scalability challenges, resource-intensive requirements, or limited diversity in attack strategies. We propose RainbowPlus, a novel red-teaming framework rooted in evolutionary computation, enhancing adversarial prompt generation through an adaptive quality-diversity (QD) search that extends classical evolutionary algorithms like MAP-Elites with innovations tailored for language models. By employing a multi-element archive to store diverse high-quality prompts and a comprehensive fitness function to evaluate multiple prompts concurrently, RainbowPlus overcomes the constraints of single-prompt archives and pairwise comparisons in prior QD methods like Rainbow Teaming. Experiments comparing RainbowPlus to QD methods across six benchmark datasets and four open-source LLMs demonstrate superior attack success rate (ASR) and diversity (Diverse-Score $\approx 0.84$), generating up to 100 times more unique prompts (e.g., 10,418 vs. 100 for Ministral-8B-Instruct-2410). Against nine state-of-the-art methods on the HarmBench dataset with twelve LLMs (ten open-source, two closed-source), RainbowPlus achieves an average ASR of 81.1%, surpassing AutoDAN-Turbo by 3.9%, and is 9 times faster (1.45 vs. 13.50 hours). Our open-source implementation fosters further advancements in LLM safety, offering a scalable tool for vulnerability assessment. Code and resources are publicly available at https://github.com/knoveleng/rainbowplus, supporting reproducibility and future research in LLM red-teaming.

Related papers

GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning [106.98018881499362]
We introduce GEPA (Genetic-Pareto), a prompt that thoroughly incorporates natural language to learn high-level rules from trial and error.<n>GEPA samples system-level trajectories (e.g., reasoning, tool calls, and tool outputs) and reflects on them in natural language to diagnose problems.<n>It can often turn even just a few rollouts into a large quality gain.
arXiv Detail & Related papers (2025-07-25T17:42:32Z)
Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs [69.10441885629787]
Retrieval-Augmented Generation (RAG) lifts the factuality of Large Language Models (LLMs) by injecting external knowledge.<n>It falls short on problems that demand multi-step inference; conversely, purely reasoning-oriented approaches often hallucinate or mis-ground facts.<n>This survey synthesizes both strands under a unified reasoning-retrieval perspective.
arXiv Detail & Related papers (2025-07-13T03:29:41Z)
Self-Routing RAG: Binding Selective Retrieval with Knowledge Verbalization [97.72503890388866]
We propose Self-Routing RAG (SR-RAG), a novel framework that binds selective retrieval with knowledge verbalization. SR-RAG enables an LLM to dynamically decide between external retrieval and verbalizing its own parametric knowledge. We introduce dynamic knowledge source inference via nearest neighbor search to improve the accuracy of knowledge source decision.
arXiv Detail & Related papers (2025-04-01T17:59:30Z)
Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models [23.68266151581951]
Retrieval-Augmented Generation (RAG) has been shown to enhance the factual accuracy of Large Language Models (LLMs) Existing methods often suffer from limited reasoning capabilities in effectively using the retrieved evidence. We introduce a novel framework, Open-RAG, designed to enhance reasoning capabilities in RAG with open-source LLMs.
arXiv Detail & Related papers (2024-10-02T17:37:18Z)
RAGLAB: A Modular and Research-Oriented Unified Framework for Retrieval-Augmented Generation [54.707460684650584]
Large Language Models (LLMs) demonstrate human-level capabilities in dialogue, reasoning, and knowledge retention. Current research addresses this bottleneck by equipping LLMs with external knowledge, a technique known as Retrieval Augmented Generation (RAG) RAGLAB is a modular and research-oriented open-source library that reproduces 6 existing algorithms and provides a comprehensive ecosystem for investigating RAG algorithms.
arXiv Detail & Related papers (2024-08-21T07:20:48Z)
Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique [22.2168585464366]
Ferret is a novel approach that builds upon Rainbow Teaming by generating multiple adversarial prompt mutations per iteration. Ferret improves the overall attack success rate (ASR) to 95%, which is 46% higher than Rainbow Teaming.
arXiv Detail & Related papers (2024-08-20T09:58:01Z)
Improved Generation of Adversarial Examples Against Safety-aligned LLMs [72.38072942860309]
Adversarial prompts generated using gradient-based methods exhibit outstanding performance in performing automatic jailbreak attacks against safety-aligned LLMs. In this paper, we explore a new perspective on this problem, suggesting that it can be alleviated by leveraging innovations inspired in transfer-based attacks. We show that 87% of the query-specific adversarial suffixes generated by the developed combination can induce Llama-2-7B-Chat to produce the output that exactly matches the target string on AdvBench.
arXiv Detail & Related papers (2024-05-28T06:10:12Z)
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts [57.49685172971446]
We present Rainbow Teaming, a novel black-box approach for producing a diverse collection of adversarial prompts.<n>Our approach reveals hundreds of effective adversarial prompts, with an attack success rate exceeding 90%.<n>We additionally explore the versatility of Rainbow Teaming by applying it to question answering and cybersecurity.
arXiv Detail & Related papers (2024-02-26T18:47:27Z)
LLMatic: Neural Architecture Search via Large Language Models and Quality Diversity Optimization [4.951599300340954]
Large Language Models (LLMs) have emerged as powerful tools capable of accomplishing a broad spectrum of tasks. We propose using the coding abilities of LLMs to introduce meaningful variations to code defining neural networks. By merging the code-generating abilities of LLMs with the diversity and robustness of QD solutions, we introduce textttLLMatic, a Neural Architecture Search (NAS) algorithm.
arXiv Detail & Related papers (2023-06-01T19:33:21Z)
Text Generation with Efficient (Soft) Q-Learning [91.47743595382758]
Reinforcement learning (RL) offers a more flexible solution by allowing users to plug in arbitrary task metrics as reward. We introduce a new RL formulation for text generation from the soft Q-learning perspective. We apply the approach to a wide range of tasks, including learning from noisy/negative examples, adversarial attacks, and prompt generation.
arXiv Detail & Related papers (2021-06-14T18:48:40Z)
SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms. SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.