Backdoor Attacks Against Speech Language Models
- URL: http://arxiv.org/abs/2510.01157v1
- Date: Wed, 01 Oct 2025 17:45:04 GMT
- Title: Backdoor Attacks Against Speech Language Models
- Authors: Alexandrine Fortier, Thomas Thebaud, Jesús Villalba, Najim Dehak, Patrick Cardinal,
- Abstract summary: We present the first systematic study of audio backdoor attacks against speech language models.<n>We demonstrate its effectiveness across four speech encoders and three datasets, covering four tasks.<n>We propose a fine-tuning-based defense that mitigates the threat of poisoned pretrained encoders.
- Score: 63.07317091368079
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) and their multimodal extensions are becoming increasingly popular. One common approach to enable multimodality is to cascade domain-specific encoders with an LLM, making the resulting model inherit vulnerabilities from all of its components. In this work, we present the first systematic study of audio backdoor attacks against speech language models. We demonstrate its effectiveness across four speech encoders and three datasets, covering four tasks: automatic speech recognition (ASR), speech emotion recognition, and gender and age prediction. The attack consistently achieves high success rates, ranging from 90.76% to 99.41%. To better understand how backdoors propagate, we conduct a component-wise analysis to identify the most vulnerable stages of the pipeline. Finally, we propose a fine-tuning-based defense that mitigates the threat of poisoned pretrained encoders.
Related papers
- Breaking Audio Large Language Models by Attacking Only the Encoder: A Universal Targeted Latent-Space Audio Attack [0.0]
We propose a universal targeted latent space attack on audio-language models.<n>Our approach learns a universal perturbation that generalizes across inputs and speakers and does not require access to the language model.
arXiv Detail & Related papers (2025-12-29T21:56:13Z) - BackdoorVLM: A Benchmark for Backdoor Attacks on Vision-Language Models [63.5775877701015]
We introduce textbfBackdoorVLM, the first comprehensive benchmark for evaluating backdoor attacks on vision-language models (VLMs)<n>BackdoorVLM organizes multimodal backdoor threats into 5 representative categories: targeted refusal, malicious injection, jailbreak, concept substitution, and perceptual hijack.<n>We evaluate these threats using 12 representative attack methods spanning text, image, and bimodal triggers, tested on 2 open-source VLMs and 3 multimodal datasets.
arXiv Detail & Related papers (2025-11-24T09:30:38Z) - ELEGANCE: Efficient LLM Guidance for Audio-Visual Target Speech Extraction [88.41471266579333]
We propose ELEGANCE, a novel framework that incorporates linguistic knowledge from large language models (LLMs) into AV-TSE models.<n> Comprehensive experiments with RoBERTa, Qwen3-0.6B, and Qwen3-4B on two AV-TSE backbones show significant improvements.
arXiv Detail & Related papers (2025-11-09T08:50:11Z) - Multi-level SSL Feature Gating for Audio Deepfake Detection [4.053610356853999]
Recent advancements in generative AI, particularly in speech synthesis, have enabled the generation of highly natural-sounding synthetic speech.<n>These innovations pose significant risks, including misuse for fraudulent activities, identity theft, and security threats.<n>Current research on spoofing detection countermeasures remains limited by generalization to unseen deepfake attacks and languages.<n>We propose a gating mechanism extracting relevant feature from the speech foundation XLS-R model as a front-end feature extractor.
arXiv Detail & Related papers (2025-09-03T15:37:52Z) - SPBA: Utilizing Speech Large Language Model for Backdoor Attacks on Speech Classification Models [4.67675814519416]
Speech-based human-computer interaction is vulnerable to backdoor attacks.<n>In this paper, we propose that speech backdoor attacks can strategically focus on speech elements such as timbre and emotion.<n>The proposed attack is called the Speech Prompt Backdoor Attack (SPBA)
arXiv Detail & Related papers (2025-06-10T02:01:00Z) - Audio Jailbreak Attacks: Exposing Vulnerabilities in SpeechGPT in a White-Box Framework [6.002582335323663]
We present an adversarial attack targeting the speech input of aligned Multimodal Large Language Models (MLLMs) in a white box scenario.<n>We introduce a novel token level attack that leverages access to the model's speech tokenization to generate adversarial token sequences.<n>Our approach achieves up to 89 percent attack success rate across multiple restricted tasks.
arXiv Detail & Related papers (2025-05-24T20:46:36Z) - Universal Acoustic Adversarial Attacks for Flexible Control of Speech-LLMs [6.8285467057172555]
We investigate universal acoustic adversarial attacks on speech LLMs.<n>We find critical vulnerabilities in Qwen2-Audio and Granite-Speech.<n>This highlights the need for more robust training strategies and improved resistance to adversarial attacks.
arXiv Detail & Related papers (2025-05-20T12:35:59Z) - Can Prompting LLMs Unlock Hate Speech Detection across Languages? A Zero-shot and Few-shot Study [59.30098850050971]
This work evaluates LLM prompting-based detection across eight non-English languages.<n>We show that while zero-shot and few-shot prompting lag behind fine-tuned encoder models on most of the real-world evaluation sets, they achieve better generalization on functional tests for hate speech detection.
arXiv Detail & Related papers (2025-05-09T16:00:01Z) - Multilingual and Multi-Accent Jailbreaking of Audio LLMs [19.5428160851918]
Multi-AudioJail is the first systematic framework to exploit multilingual and multi-accent audio jailbreaks.<n>We show how acoustic perturbations interact with cross-lingual phonetics to cause jailbreak success rates to surge.<n>We plan to release our dataset to spur research into cross-modal defenses.
arXiv Detail & Related papers (2025-04-01T18:12:23Z) - Who Can Withstand Chat-Audio Attacks? An Evaluation Benchmark for Large Audio-Language Models [60.72029578488467]
Adrial audio attacks pose a significant threat to the growing use of large audio-language models (LALMs) in human-machine interactions.<n>We introduce the Chat-Audio Attacks benchmark including four distinct types of audio attacks.<n>We evaluate six state-of-the-art LALMs with voice interaction capabilities, including Gemini-1.5-Pro, GPT-4o, and others.
arXiv Detail & Related papers (2024-11-22T10:30:48Z) - Pre-Finetuning for Few-Shot Emotional Speech Recognition [20.894029832911617]
We view speaker adaptation as a few-shot learning problem.
We propose pre-finetuning speech models on difficult tasks to distill knowledge into few-shot downstream classification objectives.
arXiv Detail & Related papers (2023-02-24T22:38:54Z) - Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of
Language Models [86.02610674750345]
Adversarial GLUE (AdvGLUE) is a new multi-task benchmark to explore and evaluate the vulnerabilities of modern large-scale language models under various types of adversarial attacks.
We apply 14 adversarial attack methods to GLUE tasks to construct AdvGLUE, which is further validated by humans for reliable annotations.
All the language models and robust training methods we tested perform poorly on AdvGLUE, with scores lagging far behind the benign accuracy.
arXiv Detail & Related papers (2021-11-04T12:59:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.