Related papers: Breaking Audio Large Language Models by Attacking Only the Encoder: A Universal Targeted Latent-Space Audio Attack

Breaking Audio Large Language Models by Attacking Only the Encoder: A Universal Targeted Latent-Space Audio Attack

URL: http://arxiv.org/abs/2512.23881v1
Date: Mon, 29 Dec 2025 21:56:13 GMT
Title: Breaking Audio Large Language Models by Attacking Only the Encoder: A Universal Targeted Latent-Space Audio Attack
Authors: Roee Ziv, Raz Lapid, Moshe Sipper,
Abstract summary: We propose a universal targeted latent space attack on audio-language models.<n>Our approach learns a universal perturbation that generalizes across inputs and speakers and does not require access to the language model.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Audio-language models combine audio encoders with large language models to enable multimodal reasoning, but they also introduce new security vulnerabilities. We propose a universal targeted latent space attack, an encoder-level adversarial attack that manipulates audio latent representations to induce attacker-specified outputs in downstream language generation. Unlike prior waveform-level or input-specific attacks, our approach learns a universal perturbation that generalizes across inputs and speakers and does not require access to the language model. Experiments on Qwen2-Audio-7B-Instruct demonstrate consistently high attack success rates with minimal perceptual distortion, revealing a critical and previously underexplored attack surface at the encoder level of multimodal systems.

Related papers

Backdoor Attacks Against Speech Language Models [63.07317091368079]
We present the first systematic study of audio backdoor attacks against speech language models.<n>We demonstrate its effectiveness across four speech encoders and three datasets, covering four tasks.<n>We propose a fine-tuning-based defense that mitigates the threat of poisoned pretrained encoders.
arXiv Detail & Related papers (2025-10-01T17:45:04Z)
When Good Sounds Go Adversarial: Jailbreaking Audio-Language Models with Benign Inputs [1.911526481015]
Our research introduces WhisperInject, a two-stage adversarial audio attack framework.<n>It can manipulate state-of-the-art audio language models to generate harmful content.<n>Our method uses imperceptible perturbations in audio inputs that remain benign to human listeners.
arXiv Detail & Related papers (2025-08-05T12:14:01Z)
Universal Acoustic Adversarial Attacks for Flexible Control of Speech-LLMs [6.8285467057172555]
We investigate universal acoustic adversarial attacks on speech LLMs.<n>We find critical vulnerabilities in Qwen2-Audio and Granite-Speech.<n>This highlights the need for more robust training strategies and improved resistance to adversarial attacks.
arXiv Detail & Related papers (2025-05-20T12:35:59Z)
Multilingual and Multi-Accent Jailbreaking of Audio LLMs [19.5428160851918]
Multi-AudioJail is the first systematic framework to exploit multilingual and multi-accent audio jailbreaks.<n>We show how acoustic perturbations interact with cross-lingual phonetics to cause jailbreak success rates to surge.<n>We plan to release our dataset to spur research into cross-modal defenses.
arXiv Detail & Related papers (2025-04-01T18:12:23Z)
Exploiting Vulnerabilities in Speech Translation Systems through Targeted Adversarial Attacks [59.87470192277124]
This paper explores methods of compromising speech translation systems through imperceptible audio manipulations.<n>We present two innovative approaches: (1) the injection of perturbation into source audio, and (2) the generation of adversarial music designed to guide targeted translation.<n>Our experiments reveal that carefully crafted audio perturbations can mislead translation models to produce targeted, harmful outputs, while adversarial music achieve this goal more covertly.<n>The implications of this research extend beyond immediate security concerns, shedding light on the interpretability and robustness of neural speech processing systems.
arXiv Detail & Related papers (2025-03-02T16:38:16Z)
"I am bad": Interpreting Stealthy, Universal and Robust Audio Jailbreaks in Audio-Language Models [0.9480364746270077]
This paper explores audio jailbreaks targeting Audio-Language Models (ALMs)<n>We construct adversarial perturbations that generalize across prompts, tasks, and even base audio samples.<n>We analyze how ALMs interpret these audio adversarial examples and reveal them to encode imperceptible first-person toxic speech.
arXiv Detail & Related papers (2025-02-02T08:36:23Z)
Where are we in audio deepfake detection? A systematic analysis over generative and detection models [59.09338266364506]
SONAR is a synthetic AI-Audio Detection Framework and Benchmark.<n>It provides a comprehensive evaluation for distinguishing cutting-edge AI-synthesized auditory content.<n>It is the first framework to uniformly benchmark AI-audio detection across both traditional and foundation model-based detection systems.
arXiv Detail & Related papers (2024-10-06T01:03:42Z)
On decoder-only architecture for speech-to-text and large language model integration [59.49886892602309]
Speech-LLaMA is a novel approach that effectively incorporates acoustic information into text-based large language models. We conduct experiments on multilingual speech-to-text translation tasks and demonstrate a significant improvement over strong baselines.
arXiv Detail & Related papers (2023-07-08T06:47:58Z)
Push-Pull: Characterizing the Adversarial Robustness for Audio-Visual Active Speaker Detection [88.74863771919445]
We reveal the vulnerability of AVASD models under audio-only, visual-only, and audio-visual adversarial attacks. We also propose a novel audio-visual interaction loss (AVIL) for making attackers difficult to find feasible adversarial examples.
arXiv Detail & Related papers (2022-10-03T08:10:12Z)
Cortical Features for Defense Against Adversarial Audio Attacks [55.61885805423492]
We propose using a computational model of the auditory cortex as a defense against adversarial attacks on audio. We show that the cortical features help defend against universal adversarial examples.
arXiv Detail & Related papers (2021-01-30T21:21:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.