Investigating Safety Vulnerabilities of Large Audio-Language Models Under Speaker Emotional Variations
- URL: http://arxiv.org/abs/2510.16893v1
- Date: Sun, 19 Oct 2025 15:41:25 GMT
- Title: Investigating Safety Vulnerabilities of Large Audio-Language Models Under Speaker Emotional Variations
- Authors: Bo-Han Feng, Chien-Feng Liu, Yu-Hsuan Li Liang, Chih-Kai Yang, Szu-Wei Fu, Zhehuai Chen, Ke-Han Lu, Sung-Feng Huang, Chao-Han Huck Yang, Yu-Chiang Frank Wang, Yun-Nung Chen, Hung-yi Lee,
- Abstract summary: This work systematically investigates the role of speaker emotion.<n>We construct a dataset of malicious speech instructions expressed across multiple emotions and intensities, and evaluate several state-of-the-art LALMs.<n>Our results reveal substantial safety inconsistencies: different emotions elicit varying levels of unsafe responses, and the effect of intensity is non-monotonic, with medium expressions often posing the greatest risk.
- Score: 94.62792643569567
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large audio-language models (LALMs) extend text-based LLMs with auditory understanding, offering new opportunities for multimodal applications. While their perception, reasoning, and task performance have been widely studied, their safety alignment under paralinguistic variation remains underexplored. This work systematically investigates the role of speaker emotion. We construct a dataset of malicious speech instructions expressed across multiple emotions and intensities, and evaluate several state-of-the-art LALMs. Our results reveal substantial safety inconsistencies: different emotions elicit varying levels of unsafe responses, and the effect of intensity is non-monotonic, with medium expressions often posing the greatest risk. These findings highlight an overlooked vulnerability in LALMs and call for alignment strategies explicitly designed to ensure robustness under emotional variation, a prerequisite for trustworthy deployment in real-world settings.
Related papers
- SARSteer: Safeguarding Large Audio Language Models via Safe-Ablated Refusal Steering [22.462892823842115]
Audio inputs can more easily elicit harmful responses than text.<n>We propose Safe-Ablated Refusal Steering (SARSteer), the first inference-time defense framework for LALMs.
arXiv Detail & Related papers (2025-10-20T15:14:25Z) - Benchmarking Gaslighting Attacks Against Speech Large Language Models [31.842578503471586]
We introduce gaslighting attacks, strategically crafted prompts designed to mislead, override, or distort model reasoning.<n>Specifically, we construct five manipulation strategies: Anger, Cognitive Disruption, Sarcasm, Implicit, and Professional Negation.<n>Our framework captures both performance degradation and behavioral responses, including unsolicited apologies and refusals.
arXiv Detail & Related papers (2025-09-24T07:57:10Z) - Steering Multimodal Large Language Models Decoding for Context-Aware Safety [40.668741064553025]
Multimodal Large Language Models (MLLMs) are increasingly deployed in real-world applications.<n>Existing methods fail to balance oversensitivity (unjustified refusals of benign queries) and undersensitivity (missed detection of visually grounded risks)<n>We introduce Safety-aware Contrastive Decoding (SafeCoDe), a lightweight and model-agnostic decoding framework that dynamically adjusts token generation based on multimodal context.
arXiv Detail & Related papers (2025-09-23T16:32:25Z) - Automating Steering for Safe Multimodal Large Language Models [58.36932318051907]
We introduce a modular and adaptive inference-time intervention technology, AutoSteer, without requiring any fine-tuning of the underlying model.<n>AutoSteer incorporates three core components: (1) a novel Safety Awareness Score (SAS) that automatically identifies the most safety-relevant distinctions among the model's internal layers; (2) an adaptive safety prober trained to estimate the likelihood of toxic outputs from intermediate representations; and (3) a lightweight Refusal Head that selectively intervenes to modulate generation when safety risks are detected.
arXiv Detail & Related papers (2025-07-17T16:04:55Z) - ROSE: Toward Reality-Oriented Safety Evaluation of Large Language Models [60.28667314609623]
Large Language Models (LLMs) are increasingly deployed as black-box components in real-world applications.<n>We propose Reality-Oriented Safety Evaluation (ROSE), a novel framework that uses multi-objective reinforcement learning to fine-tune an adversarial LLM.
arXiv Detail & Related papers (2025-06-17T10:55:17Z) - Survey of Adversarial Robustness in Multimodal Large Language Models [17.926240920647892]
Multimodal Large Language Models (MLLMs) have demonstrated exceptional performance in artificial intelligence.<n>Their deployment in real-world applications raises significant concerns about adversarial vulnerabilities.<n>This paper reviews the adversarial robustness of MLLMs, covering different modalities.
arXiv Detail & Related papers (2025-03-18T06:54:59Z) - Don't Get Too Excited -- Eliciting Emotions in LLMs [1.8399318639816038]
This paper investigates the challenges of affect control in large language models (LLMs)<n>We evaluate state-of-the-art open-weight LLMs to assess their affective expressive range.<n>We quantify the models' capacity to express a wide spectrum of emotions and how they fluctuate during interactions.
arXiv Detail & Related papers (2025-03-04T10:06:41Z) - Multimodal Situational Safety [73.63981779844916]
We present the first evaluation and analysis of a novel safety challenge termed Multimodal Situational Safety.<n>For an MLLM to respond safely, whether through language or action, it often needs to assess the safety implications of a language query within its corresponding visual context.<n>We develop the Multimodal Situational Safety benchmark (MSSBench) to assess the situational safety performance of current MLLMs.
arXiv Detail & Related papers (2024-10-08T16:16:07Z) - MOSSBench: Is Your Multimodal Language Model Oversensitive to Safe Queries? [70.77691645678804]
Humans are prone to cognitive distortions -- biased thinking patterns that lead to exaggerated responses to specific stimuli.
This paper demonstrates that advanced Multimodal Large Language Models (MLLMs) exhibit similar tendencies.
We identify three types of stimuli that trigger the oversensitivity of existing MLLMs: Exaggerated Risk, Negated Harm, and Counterintuitive.
arXiv Detail & Related papers (2024-06-22T23:26:07Z) - Unique Security and Privacy Threats of Large Language Models: A Comprehensive Survey [63.4581186135101]
Large language models (LLMs) have made remarkable advancements in natural language processing.<n>Privacy and security issues have been revealed throughout their life cycle.<n>This survey outlines and analyzes potential countermeasures.
arXiv Detail & Related papers (2024-06-12T07:55:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.