Related papers: Attacker's Noise Can Manipulate Your Audio-based LLM in the Real World

Attacker's Noise Can Manipulate Your Audio-based LLM in the Real World

URL: http://arxiv.org/abs/2507.06256v1
Date: Mon, 07 Jul 2025 07:29:52 GMT
Title: Attacker's Noise Can Manipulate Your Audio-based LLM in the Real World
Authors: Vinu Sankar Sadasivan, Soheil Feizi, Rajiv Mathews, Lun Wang,
Abstract summary: This paper investigates the real-world vulnerabilities of audio-based large language models (ALLMs), such as Qwen2-Audio.<n>An adversary can craft stealthy audio perturbations to manipulate ALLMs into exhibiting specific targeted behaviors.<n>We show that playing adversarial background noise during user interaction with the ALLMs can significantly degrade the response quality.
Score: 54.68651652564436
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper investigates the real-world vulnerabilities of audio-based large language models (ALLMs), such as Qwen2-Audio. We first demonstrate that an adversary can craft stealthy audio perturbations to manipulate ALLMs into exhibiting specific targeted behaviors, such as eliciting responses to wake-keywords (e.g., "Hey Qwen"), or triggering harmful behaviors (e.g. "Change my calendar event"). Subsequently, we show that playing adversarial background noise during user interaction with the ALLMs can significantly degrade the response quality. Crucially, our research illustrates the scalability of these attacks to real-world scenarios, impacting other innocent users when these adversarial noises are played through the air. Further, we discuss the transferrability of the attack, and potential defensive measures.

Related papers

When Good Sounds Go Adversarial: Jailbreaking Audio-Language Models with Benign Inputs [1.911526481015]
Our research introduces WhisperInject, a two-stage adversarial audio attack framework.<n>It can manipulate state-of-the-art audio language models to generate harmful content.<n>Our method uses imperceptible perturbations in audio inputs that remain benign to human listeners.
arXiv Detail & Related papers (2025-08-05T12:14:01Z)
Hidden in the Noise: Unveiling Backdoors in Audio LLMs Alignment through Latent Acoustic Pattern Triggers [40.4026420070893]
We introduce Hidden in the Noise (HIN), a novel backdoor attack framework designed to exploit subtle, audio-specific features.<n>HIN applies acoustic modifications to raw audio waveforms, such as alterations to temporal dynamics and strategic injection of spectrally tailored noise.<n>To evaluate ALLM robustness against audio-feature-based triggers, we develop the AudioSafe benchmark, assessing nine distinct risk types.
arXiv Detail & Related papers (2025-08-04T08:15:16Z)
Universal Acoustic Adversarial Attacks for Flexible Control of Speech-LLMs [6.8285467057172555]
We investigate universal acoustic adversarial attacks on speech LLMs.<n>We find critical vulnerabilities in Qwen2-Audio and Granite-Speech.<n>This highlights the need for more robust training strategies and improved resistance to adversarial attacks.
arXiv Detail & Related papers (2025-05-20T12:35:59Z)
Who Can Withstand Chat-Audio Attacks? An Evaluation Benchmark for Large Audio-Language Models [60.72029578488467]
Adrial audio attacks pose a significant threat to the growing use of large audio-language models (LALMs) in human-machine interactions.<n>We introduce the Chat-Audio Attacks benchmark including four distinct types of audio attacks.<n>We evaluate six state-of-the-art LALMs with voice interaction capabilities, including Gemini-1.5-Pro, GPT-4o, and others.
arXiv Detail & Related papers (2024-11-22T10:30:48Z)
Can DeepFake Speech be Reliably Detected? [17.10792531439146]
This work presents the first systematic study of active malicious attacks against state-of-the-art open-source speech detectors. The results highlight the urgent need for more robust detection methods in the face of evolving adversarial threats.
arXiv Detail & Related papers (2024-10-09T06:13:48Z)
Towards Evaluating the Robustness of Automatic Speech Recognition Systems via Audio Style Transfer [8.948537516293328]
We propose an attack on Automatic Speech Recognition (ASR) systems based on user-customized style transfer. Our method can meet the need for user-customized styles and achieve a success rate of 82% in attacks.
arXiv Detail & Related papers (2024-05-15T16:05:24Z)
Leveraging the Context through Multi-Round Interactions for Jailbreaking Attacks [55.603893267803265]
Large Language Models (LLMs) are susceptible to Jailbreaking attacks. Jailbreaking attacks aim to extract harmful information by subtly modifying the attack query. We focus on a new attack form, called Contextual Interaction Attack.
arXiv Detail & Related papers (2024-02-14T13:45:19Z)
Push-Pull: Characterizing the Adversarial Robustness for Audio-Visual Active Speaker Detection [88.74863771919445]
We reveal the vulnerability of AVASD models under audio-only, visual-only, and audio-visual adversarial attacks. We also propose a novel audio-visual interaction loss (AVIL) for making attackers difficult to find feasible adversarial examples.
arXiv Detail & Related papers (2022-10-03T08:10:12Z)
Towards Adversarially Robust Deep Image Denoising [199.2458715635285]
This work systematically investigates the adversarial robustness of deep image denoisers (DIDs) We propose a novel adversarial attack, namely Observation-based Zero-mean Attack (sc ObsAtk) to craft adversarial zero-mean perturbations on given noisy images. To robustify DIDs, we propose hybrid adversarial training (sc HAT) that jointly trains DIDs with adversarial and non-adversarial noisy data.
arXiv Detail & Related papers (2022-01-12T10:23:14Z)
Attack on practical speaker verification system using universal adversarial perturbations [20.38185341318529]
This work shows that by playing our crafted adversarial perturbation as a separate source when the adversary is speaking, the practical speaker verification system will misjudge the adversary as a target speaker. A two-step algorithm is proposed to optimize the universal adversarial perturbation to be text-independent and has little effect on the authentication text recognition.
arXiv Detail & Related papers (2021-05-19T09:43:34Z)
Removing Adversarial Noise in Class Activation Feature Space [160.78488162713498]
We propose to remove adversarial noise by implementing a self-supervised adversarial training mechanism in a class activation feature space. We train a denoising model to minimize the distances between the adversarial examples and the natural examples in the class activation feature space. Empirical evaluations demonstrate that our method could significantly enhance adversarial robustness in comparison to previous state-of-the-art approaches.
arXiv Detail & Related papers (2021-04-19T10:42:24Z)
Detecting Audio Attacks on ASR Systems with Dropout Uncertainty [40.9172128924305]
We show that our defense is able to detect attacks created through optimized perturbations and frequency masking. We test our defense on Mozilla's CommonVoice dataset, the UrbanSound dataset, and an excerpt of the LibriSpeech dataset.
arXiv Detail & Related papers (2020-06-02T19:40:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.