Related papers: Audio Is the Achilles' Heel: Red Teaming Audio Large Multimodal Models

Related papers

Con Instruction: Universal Jailbreaking of Multimodal Large Language Models via Non-Textual Modalities [76.9327488986162]
Existing attacks against multimodal language models (MLLMs) primarily communicate instructions through text accompanied by adversarial images.<n>We exploit the capabilities of MLLMs to interpret non-textual instructions, specifically, adversarial images or audio generated by our novel method, Con Instruction.<n>Our method achieves the highest attack success rates, reaching 81.3% and 86.6% on LLaVA-v1.5 (13B)
arXiv Detail & Related papers (2025-05-31T13:11:14Z)
Reshaping Representation Space to Balance the Safety and Over-rejection in Large Audio Language Models [50.89022445197919]
Large Audio Language Models (LALMs) have extended the capabilities of Large Language Models (LLMs)<n>Recent research has revealed that LALMs remain vulnerable to harmful queries due to insufficient safety-alignment.
arXiv Detail & Related papers (2025-05-26T08:25:25Z)
Audio Jailbreak Attacks: Exposing Vulnerabilities in SpeechGPT in a White-Box Framework [6.002582335323663]
We present an adversarial attack targeting the speech input of aligned Multimodal Large Language Models (MLLMs) in a white box scenario.<n>We introduce a novel token level attack that leverages access to the model's speech tokenization to generate adversarial token sequences.<n>Our approach achieves up to 89 percent attack success rate across multiple restricted tasks.
arXiv Detail & Related papers (2025-05-24T20:46:36Z)
JALMBench: Benchmarking Jailbreak Vulnerabilities in Audio Language Models [25.93179924805564]
We present JALMBench, the textitfirst comprehensive benchmark to assess the safety of Audio Language Models (ALMs) against jailbreak attacks.<n>Using JALMBench, we provide an in-depth analysis of attack efficiency, topic sensitivity, voice diversity, and attack representations.
arXiv Detail & Related papers (2025-05-23T07:29:55Z)
Multilingual and Multi-Accent Jailbreaking of Audio LLMs [19.5428160851918]
Multi-AudioJail is the first systematic framework to exploit multilingual and multi-accent audio jailbreaks. We show how acoustic perturbations interact with cross-lingual phonetics to cause jailbreak success rates to surge. We plan to release our dataset to spur research into cross-modal defenses.
arXiv Detail & Related papers (2025-04-01T18:12:23Z)
SafeDialBench: A Fine-Grained Safety Benchmark for Large Language Models in Multi-Turn Dialogues with Diverse Jailbreak Attacks [46.25325034315104]
We propose a fine-grained benchmark SafeDialBench for evaluating the safety of Large Language Models (LLMs) Specifically, we design a two-tier hierarchical safety taxonomy that considers 6 safety dimensions and generates more than 4000 multi-turn dialogues in both Chinese and English under 22 dialogue scenarios. Notably, we construct an innovative assessment framework of LLMs, measuring capabilities in detecting, and handling unsafe information and maintaining consistency when facing jailbreak attacks.
arXiv Detail & Related papers (2025-02-16T12:08:08Z)
"I am bad": Interpreting Stealthy, Universal and Robust Audio Jailbreaks in Audio-Language Models [0.9480364746270077]
This paper explores audio jailbreaks targeting Audio-Language Models (ALMs) We construct adversarial perturbations that generalize across prompts, tasks, and even base audio samples. We analyze how ALMs interpret these audio adversarial examples and reveal them to encode imperceptible first-person toxic speech.
arXiv Detail & Related papers (2025-02-02T08:36:23Z)
Tune In, Act Up: Exploring the Impact of Audio Modality-Specific Edits on Large Audio Language Models in Jailbreak [35.62727804915181]
This paper investigates how audio-specific edits influence Large Audio-Language Models (LALMs) inference regarding jailbreak. We introduce the Audio Editing Toolbox (AET), which enables audio-modality edits such as tone adjustment, word emphasis, and noise injection. We also conduct extensive evaluations of state-of-the-art LALMs to assess their robustness under different audio edits.
arXiv Detail & Related papers (2025-01-23T15:51:38Z)
AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information? [65.49972312524724]
multimodal large language models (MLLMs) have expanded their capabilities to include vision and audio modalities. Our proposed DeafTest reveals that MLLMs often struggle with simple tasks humans find trivial. We introduce AV-Odyssey Bench, a comprehensive audio-visual benchmark designed to assess whether those MLLMs can truly understand the audio-visual information.
arXiv Detail & Related papers (2024-12-03T17:41:23Z)
Who Can Withstand Chat-Audio Attacks? An Evaluation Benchmark for Large Audio-Language Models [60.72029578488467]
Adrial audio attacks pose a significant threat to the growing use of large audio-language models (LALMs) in human-machine interactions.<n>We introduce the Chat-Audio Attacks benchmark including four distinct types of audio attacks.<n>We evaluate six state-of-the-art LALMs with voice interaction capabilities, including Gemini-1.5-Pro, GPT-4o, and others.
arXiv Detail & Related papers (2024-11-22T10:30:48Z)
IDEATOR: Jailbreaking and Benchmarking Large Vision-Language Models Using Themselves [64.46372846359694]
IDEATOR is a novel jailbreak method that autonomously generates malicious image-text pairs for black-box jailbreak attacks. Our benchmark results on 11 recently releasedVLMs reveal significant gaps in safety alignment. For instance, our challenge set achieves ASRs of 46.31% on GPT-4o and 19.65% on Claude-3.5-Sonnet.
arXiv Detail & Related papers (2024-10-29T07:15:56Z)
VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning [64.56272011710735]
We propose a novel single-stage joint speech-text SFT approach on the low-rank adaptation (LoRA) of the large language models (LLMs) backbone. Compared to previous SpeechLMs with 7B or 13B parameters, our 3B model demonstrates superior performance across various speech benchmarks.
arXiv Detail & Related papers (2024-10-23T00:36:06Z)
Large Language Models Are Strong Audio-Visual Speech Recognition Learners [53.142635674428874]
Multimodal large language models (MLLMs) have recently become a focal point of research due to their formidable multimodal understanding capabilities. We propose Llama-AVSR, a new MLLM with strong audio-visual speech recognition capabilities. We evaluate our proposed approach on LRS3, the largest public AVSR benchmark, and we achieve new state-of-the-art results for the tasks of ASR and AVSR with a WER of 0.81% and 0.77%, respectively.
arXiv Detail & Related papers (2024-09-18T21:17:27Z)
Towards Probing Speech-Specific Risks in Large Multimodal Models: A Taxonomy, Benchmark, and Insights [50.89022445197919]
We propose a speech-specific risk taxonomy, covering 8 risk categories under hostility (malicious sarcasm and threats), malicious imitation (age, gender, ethnicity), and stereotypical biases (age, gender, ethnicity) Based on the taxonomy, we create a small-scale dataset for evaluating current LMMs capability in detecting these categories of risk.
arXiv Detail & Related papers (2024-06-25T10:08:45Z)
SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large Language Models [34.557309967708406]
In this work, we investigate the potential vulnerabilities of such instruction-following speech-language models to adversarial attacks and jailbreaking. We design algorithms that can generate adversarial examples to jailbreak SLMs in both white-box and black-box attack settings without human involvement. Our models, trained on dialog data with speech instructions, achieve state-of-the-art performance on spoken question-answering task, scoring over 80% on both safety and helpfulness metrics.
arXiv Detail & Related papers (2024-05-14T04:51:23Z)
Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models [107.88745040504887]
We study the harmlessness alignment problem of multimodal large language models (MLLMs) Inspired by this, we propose a novel jailbreak method named HADES, which hides and amplifies the harmfulness of the malicious intent within the text input. Experimental results show that HADES can effectively jailbreak existing MLLMs, which achieves an average Attack Success Rate (ASR) of 90.26% for LLaVA-1.5 and 71.60% for Gemini Pro Vision.
arXiv Detail & Related papers (2024-03-14T18:24:55Z)
Identifying False Content and Hate Speech in Sinhala YouTube Videos by Analyzing the Audio [0.0]
This study proposes a solution to minimize the spread of violence and misinformation in Sinhala YouTube videos. The approach involves developing a rating system that assesses whether a video contains false information by comparing the title and description with the audio content.
arXiv Detail & Related papers (2024-01-30T08:08:34Z)
FigStep: Jailbreaking Large Vision-language Models via Typographic Visual Prompts [14.948652267916149]
We propose FigStep, a jailbreaking algorithm against large vision-language models (VLMs) Instead of feeding textual harmful instructions directly, FigStep converts the harmful content into images through typography. FigStep can achieve an average attack success rate of 82.50% on 500 harmful queries in 10 topics.
arXiv Detail & Related papers (2023-11-09T18:59:11Z)
Multilingual Jailbreak Challenges in Large Language Models [96.74878032417054]
In this study, we reveal the presence of multilingual jailbreak challenges within large language models (LLMs) We consider two potential risky scenarios: unintentional and intentional. We propose a novel textscSelf-Defense framework that automatically generates multilingual training data for safety fine-tuning.
arXiv Detail & Related papers (2023-10-10T09:44:06Z)
DeepSafety:Multi-level Audio-Text Feature Extraction and Fusion Approach for Violence Detection in Conversations [2.8038382295783943]
The choice of words and vocal cues in conversations presents an underexplored rich source of natural language data for personal safety and crime prevention. We introduce a new information fusion approach that extracts and fuses multi-level features including verbal, vocal, and text as heterogeneous sources of information to detect the extent of violent behaviours in conversations.
arXiv Detail & Related papers (2022-06-23T16:45:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.