"I am bad": Interpreting Stealthy, Universal and Robust Audio Jailbreaks in Audio-Language Models
- URL: http://arxiv.org/abs/2502.00718v1
- Date: Sun, 02 Feb 2025 08:36:23 GMT
- Title: "I am bad": Interpreting Stealthy, Universal and Robust Audio Jailbreaks in Audio-Language Models
- Authors: Isha Gupta, David Khachaturov, Robert Mullins,
- Abstract summary: This paper explores audio jailbreaks targeting Audio-Language Models (ALMs)
We construct adversarial perturbations that generalize across prompts, tasks, and even base audio samples.
We analyze how ALMs interpret these audio adversarial examples and reveal them to encode imperceptible first-person toxic speech.
- Score: 0.9480364746270077
- License:
- Abstract: The rise of multimodal large language models has introduced innovative human-machine interaction paradigms but also significant challenges in machine learning safety. Audio-Language Models (ALMs) are especially relevant due to the intuitive nature of spoken communication, yet little is known about their failure modes. This paper explores audio jailbreaks targeting ALMs, focusing on their ability to bypass alignment mechanisms. We construct adversarial perturbations that generalize across prompts, tasks, and even base audio samples, demonstrating the first universal jailbreaks in the audio modality, and show that these remain effective in simulated real-world conditions. Beyond demonstrating attack feasibility, we analyze how ALMs interpret these audio adversarial examples and reveal them to encode imperceptible first-person toxic speech - suggesting that the most effective perturbations for eliciting toxic outputs specifically embed linguistic features within the audio signal. These results have important implications for understanding the interactions between different modalities in multimodal models, and offer actionable insights for enhancing defenses against adversarial audio attacks.
Related papers
- Tune In, Act Up: Exploring the Impact of Audio Modality-Specific Edits on Large Audio Language Models in Jailbreak [35.62727804915181]
This paper investigates how audio-specific edits influence Large Audio-Language Models (LALMs) inference regarding jailbreak.
We introduce the Audio Editing Toolbox (AET), which enables audio-modality edits such as tone adjustment, word emphasis, and noise injection.
We also conduct extensive evaluations of state-of-the-art LALMs to assess their robustness under different audio edits.
arXiv Detail & Related papers (2025-01-23T15:51:38Z) - Self-Powered LLM Modality Expansion for Large Speech-Text Models [62.27700381806554]
Large language models (LLMs) exhibit remarkable performance across diverse tasks.
This study aims to refine the use of speech datasets for LSM training by addressing the limitations of vanilla instruction tuning.
We introduce a self-powered LSM that leverages augmented automatic speech recognition data generated by the model itself for more effective instruction tuning.
arXiv Detail & Related papers (2024-10-04T04:34:24Z) - DeSTA2: Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data [84.01401439030265]
Recent end-to-end speech language models (SLMs) have expanded upon the capabilities of large language models (LLMs)
We present a simple yet effective automatic process for creating speech-text pair data.
Our model demonstrates general capabilities for speech-related tasks without the need for speech instruction-tuning data.
arXiv Detail & Related papers (2024-09-30T07:01:21Z) - Controlling Whisper: Universal Acoustic Adversarial Attacks to Control Speech Foundation Models [3.1511847280063696]
Speech enabled foundation models can perform tasks other than automatic speech recognition using an appropriate prompt.
With the development of audio-prompted large language models there is the potential for even greater control options.
We demonstrate that with this greater flexibility the systems can be susceptible to model-control adversarial attacks.
arXiv Detail & Related papers (2024-07-05T13:04:31Z) - Muting Whisper: A Universal Acoustic Adversarial Attack on Speech Foundation Models [5.942307521138583]
We show that special tokens' can be exploited by adversarial attacks to manipulate the model's behavior.
We propose a simple yet effective method to learn a universal acoustic realization of Whisper's $texttt|endoftext|>$ token.
Experiments demonstrate that the same, universal 0.64-second adversarial audio segment can successfully mute a target Whisper ASR model for over 97% of speech samples.
arXiv Detail & Related papers (2024-05-09T22:59:23Z) - A Systematic Evaluation of Adversarial Attacks against Speech Emotion Recognition Models [6.854732863866882]
Speech emotion recognition (SER) is constantly gaining attention in recent years due to its potential applications in diverse fields.
Recent studies have shown that deep learning models can be vulnerable to adversarial attacks.
arXiv Detail & Related papers (2024-04-29T09:00:32Z) - On decoder-only architecture for speech-to-text and large language model
integration [59.49886892602309]
Speech-LLaMA is a novel approach that effectively incorporates acoustic information into text-based large language models.
We conduct experiments on multilingual speech-to-text translation tasks and demonstrate a significant improvement over strong baselines.
arXiv Detail & Related papers (2023-07-08T06:47:58Z) - Membership Inference Attacks Against Self-supervised Speech Models [62.73937175625953]
Self-supervised learning (SSL) on continuous speech has started gaining attention.
We present the first privacy analysis on several SSL speech models using Membership Inference Attacks (MIA) under black-box access.
arXiv Detail & Related papers (2021-11-09T13:00:24Z) - Characterizing the adversarial vulnerability of speech self-supervised
learning [95.03389072594243]
We make the first attempt to investigate the adversarial vulnerability of such paradigm under the attacks from both zero-knowledge adversaries and limited-knowledge adversaries.
The experimental results illustrate that the paradigm proposed by SUPERB is seriously vulnerable to limited-knowledge adversaries.
arXiv Detail & Related papers (2021-11-08T08:44:04Z) - CTAL: Pre-training Cross-modal Transformer for Audio-and-Language
Representations [20.239063010740853]
We present a Cross-modal Transformer for Audio-and-Language, i.e., CTAL, which aims to learn the intra-modality and inter-modality connections between audio and language.
We observe significant improvements across various tasks, such as, emotion classification, sentiment analysis, and speaker verification.
arXiv Detail & Related papers (2021-09-01T04:18:19Z) - Multi-task self-supervised learning for Robust Speech Recognition [75.11748484288229]
This paper proposes PASE+, an improved version of PASE for robust speech recognition in noisy and reverberant environments.
We employ an online speech distortion module, that contaminates the input signals with a variety of random disturbances.
We then propose a revised encoder that better learns short- and long-term speech dynamics with an efficient combination of recurrent and convolutional networks.
arXiv Detail & Related papers (2020-01-25T00:24:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.