Multilingual and Multi-Accent Jailbreaking of Audio LLMs
- URL: http://arxiv.org/abs/2504.01094v1
- Date: Tue, 01 Apr 2025 18:12:23 GMT
- Title: Multilingual and Multi-Accent Jailbreaking of Audio LLMs
- Authors: Jaechul Roh, Virat Shejwalkar, Amir Houmansadr,
- Abstract summary: Multi-AudioJail is the first systematic framework to exploit multilingual and multi-accent audio jailbreaks.<n>We show how acoustic perturbations interact with cross-lingual phonetics to cause jailbreak success rates to surge.<n>We plan to release our dataset to spur research into cross-modal defenses.
- Score: 19.5428160851918
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Audio Language Models (LALMs) have significantly advanced audio understanding but introduce critical security risks, particularly through audio jailbreaks. While prior work has focused on English-centric attacks, we expose a far more severe vulnerability: adversarial multilingual and multi-accent audio jailbreaks, where linguistic and acoustic variations dramatically amplify attack success. In this paper, we introduce Multi-AudioJail, the first systematic framework to exploit these vulnerabilities through (1) a novel dataset of adversarially perturbed multilingual/multi-accent audio jailbreaking prompts, and (2) a hierarchical evaluation pipeline revealing that how acoustic perturbations (e.g., reverberation, echo, and whisper effects) interacts with cross-lingual phonetics to cause jailbreak success rates (JSRs) to surge by up to +57.25 percentage points (e.g., reverberated Kenyan-accented attack on MERaLiON). Crucially, our work further reveals that multimodal LLMs are inherently more vulnerable than unimodal systems: attackers need only exploit the weakest link (e.g., non-English audio inputs) to compromise the entire model, which we empirically show by multilingual audio-only attacks achieving 3.1x higher success rates than text-only attacks. We plan to release our dataset to spur research into cross-modal defenses, urging the community to address this expanding attack surface in multimodality as LALMs evolve.
Related papers
- MR. Guard: Multilingual Reasoning Guardrail using Curriculum Learning [56.79292318645454]
Large Language Models (LLMs) are susceptible to adversarial attacks such as jailbreaking.
This vulnerability is exacerbated in multilingual setting, where multilingual safety-aligned data are often limited.
We propose an approach to build a multilingual guardrail with reasoning.
arXiv Detail & Related papers (2025-04-21T17:15:06Z) - Exploiting Vulnerabilities in Speech Translation Systems through Targeted Adversarial Attacks [59.87470192277124]
This paper explores methods of compromising speech translation systems through imperceptible audio manipulations.<n>We present two innovative approaches: (1) the injection of perturbation into source audio, and (2) the generation of adversarial music designed to guide targeted translation.<n>Our experiments reveal that carefully crafted audio perturbations can mislead translation models to produce targeted, harmful outputs, while adversarial music achieve this goal more covertly.<n>The implications of this research extend beyond immediate security concerns, shedding light on the interpretability and robustness of neural speech processing systems.
arXiv Detail & Related papers (2025-03-02T16:38:16Z) - Reasoning-Augmented Conversation for Multi-Turn Jailbreak Attacks on Large Language Models [53.580928907886324]
Reasoning-Augmented Conversation is a novel multi-turn jailbreak framework.
It reformulates harmful queries into benign reasoning tasks.
We show that RACE achieves state-of-the-art attack effectiveness in complex conversational scenarios.
arXiv Detail & Related papers (2025-02-16T09:27:44Z) - `Do as I say not as I do': A Semi-Automated Approach for Jailbreak Prompt Attack against Multimodal LLMs [6.151779089440453]
We introduce the first voice-based jailbreak attack against multimodal large language models (LLMs)<n>We propose a novel strategy, in which the disallowed prompt is flanked by benign, narrative-driven prompts.<n>We demonstrate that Flanking Attack is capable of manipulating state-of-the-art LLMs into generating misaligned and forbidden outputs.
arXiv Detail & Related papers (2025-02-02T10:05:08Z) - "I am bad": Interpreting Stealthy, Universal and Robust Audio Jailbreaks in Audio-Language Models [0.9480364746270077]
This paper explores audio jailbreaks targeting Audio-Language Models (ALMs)<n>We construct adversarial perturbations that generalize across prompts, tasks, and even base audio samples.<n>We analyze how ALMs interpret these audio adversarial examples and reveal them to encode imperceptible first-person toxic speech.
arXiv Detail & Related papers (2025-02-02T08:36:23Z) - MRJ-Agent: An Effective Jailbreak Agent for Multi-Round Dialogue [35.7801861576917]
Large Language Models (LLMs) demonstrate outstanding performance in their reservoir of knowledge and understanding capabilities.<n>LLMs have been shown to be prone to illegal or unethical reactions when subjected to jailbreak attacks.<n>We propose a novel multi-round dialogue jailbreaking agent, emphasizing the importance of stealthiness in identifying and mitigating potential threats to human values.
arXiv Detail & Related papers (2024-11-06T10:32:09Z) - Audio Is the Achilles' Heel: Red Teaming Audio Large Multimodal Models [50.89022445197919]
We show that open-source audio LMMs suffer an average attack success rate of 69.14% on harmful audio questions.
Our speech-specific jailbreaks on Gemini-1.5-Pro achieve an attack success rate of 70.67% on the harmful query benchmark.
arXiv Detail & Related papers (2024-10-31T12:11:17Z) - Multi-Turn Context Jailbreak Attack on Large Language Models From First Principles [2.5167155755957316]
Context Fusion Attack (CFA) is a contextual fusion black-box jailbreak attack method.
We show CFA's superior success rate, divergence, and harmfulness compared to other multi-turn attack strategies.
arXiv Detail & Related papers (2024-08-08T09:18:47Z) - Jailbreak Vision Language Models via Bi-Modal Adversarial Prompt [60.54666043358946]
This paper introduces the Bi-Modal Adversarial Prompt Attack (BAP), which executes jailbreaks by optimizing textual and visual prompts cohesively.
In particular, we utilize a large language model to analyze jailbreak failures and employ chain-of-thought reasoning to refine textual prompts.
arXiv Detail & Related papers (2024-06-06T13:00:42Z) - White-box Multimodal Jailbreaks Against Large Vision-Language Models [61.97578116584653]
We propose a more comprehensive strategy that jointly attacks both text and image modalities to exploit a broader spectrum of vulnerability within Large Vision-Language Models.
Our attack method begins by optimizing an adversarial image prefix from random noise to generate diverse harmful responses in the absence of text input.
An adversarial text suffix is integrated and co-optimized with the adversarial image prefix to maximize the probability of eliciting affirmative responses to various harmful instructions.
arXiv Detail & Related papers (2024-05-28T07:13:30Z) - A Cross-Language Investigation into Jailbreak Attacks in Large Language
Models [14.226415550366504]
A particularly underexplored area is the Multilingual Jailbreak attack.
There is a lack of comprehensive empirical studies addressing this specific threat.
This study provides valuable insights into understanding and mitigating Multilingual Jailbreak attacks.
arXiv Detail & Related papers (2024-01-30T06:04:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.