MORE: Multi-Objective Adversarial Attacks on Speech Recognition
- URL: http://arxiv.org/abs/2601.01852v2
- Date: Wed, 14 Jan 2026 02:50:21 GMT
- Title: MORE: Multi-Objective Adversarial Attacks on Speech Recognition
- Authors: Xiaoxue Gao, Zexin Li, Yiming Chen, Nancy F. Chen,
- Abstract summary: Large-scale automatic speech recognition (ASR) models such as Whisper have expanded their adoption across diverse real-world applications.<n> robustness against even minor input perturbations is therefore critical for maintaining reliable performance in real-time environments.<n>We introduce MORE, a multi-objective repetitive doubling encouragement attack, which jointly degrades recognition accuracy and inference efficiency.
- Score: 39.77140497042348
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The emergence of large-scale automatic speech recognition (ASR) models such as Whisper has greatly expanded their adoption across diverse real-world applications. Ensuring robustness against even minor input perturbations is therefore critical for maintaining reliable performance in real-time environments. While prior work has mainly examined accuracy degradation under adversarial attacks, robustness with respect to efficiency remains largely unexplored. This narrow focus provides only a partial understanding of ASR model vulnerabilities. To address this gap, we conduct a comprehensive study of ASR robustness under multiple attack scenarios. We introduce MORE, a multi-objective repetitive doubling encouragement attack, which jointly degrades recognition accuracy and inference efficiency through a hierarchical staged repulsion-anchoring mechanism. Specifically, we reformulate multi-objective adversarial optimization into a hierarchical framework that sequentially achieves the dual objectives. To further amplify effectiveness, we propose a novel repetitive encouragement doubling objective (REDO) that induces duplicative text generation by maintaining accuracy degradation and periodically doubling the predicted sequence length. Overall, MORE compels ASR models to produce incorrect transcriptions at a substantially higher computational cost, triggered by a single adversarial input. Experiments show that MORE consistently yields significantly longer transcriptions while maintaining high word error rates compared to existing baselines, underscoring its effectiveness in multi-objective adversarial attack.
Related papers
- DUAP: Dual-task Universal Adversarial Perturbations Against Voice Control Systems [10.342045511863288]
We propose Dual-task Universal Adversarial Perturbation (DUAP)<n>DUAP employs a targeted surrogate objective to effectively disrupt ASR transcription and introduces a Dynamic Normalized Ensemble (DNE) strategy to enhance transferability across diverse SR models.<n> Extensive evaluations across five ASR and six SR models demonstrate that DUAP achieves high simultaneous attack success rates and superior imperceptibility.
arXiv Detail & Related papers (2026-01-19T07:39:22Z) - Anti-Length Shift: Dynamic Outlier Truncation for Training Efficient Reasoning Models [29.56923793047279]
We introduce Dynamic Outlier Truncation (DOT), a training-time intervention that selectively suppresses redundant tokens.<n>DOT targets only the extreme tail of response lengths within fully correct rollout groups while preserving long-horizon reasoning capabilities.<n>Our method reduces inference token usage by 78% while simultaneously increasing accuracy compared to the initial policy.
arXiv Detail & Related papers (2026-01-07T14:31:07Z) - Debiased Dual-Invariant Defense for Adversarially Robust Person Re-Identification [52.63017280231648]
Person re-identification (ReID) is a fundamental task in many real-world applications such as pedestrian trajectory tracking.<n>Person ReID models are highly susceptible to adversarial attacks, where imperceptible perturbations to pedestrian images can cause entirely incorrect predictions.<n>We propose a dual-invariant defense framework composed of two main phases.
arXiv Detail & Related papers (2025-11-13T03:56:40Z) - Thinking Forward and Backward: Multi-Objective Reinforcement Learning for Retrieval-Augmented Reasoning [137.33138614095435]
Retrieval-augmented generation (RAG) has proven to be effective in mitigating hallucinations in large language models.<n>Recent efforts have incorporated search-based interactions into RAG, enabling iterative reasoning with real-time retrieval.<n>We propose Bi-RAR, a novel retrieval-augmented reasoning framework that evaluates each intermediate step jointly in both forward and backward directions.
arXiv Detail & Related papers (2025-11-12T08:29:39Z) - PromptSleuth: Detecting Prompt Injection via Semantic Intent Invariance [10.105673138616483]
Large Language Models (LLMs) are increasingly integrated into real-world applications, from virtual assistants to autonomous agents.<n>As attackers evolve with paraphrased, obfuscated, and even multi-task injection strategies, existing benchmarks are no longer sufficient to capture the full spectrum of emerging threats.<n>We propose PromptSleuth, a semantic-oriented defense framework that detects prompt injection by reasoning over task-level intent rather than surface features.
arXiv Detail & Related papers (2025-08-28T15:19:07Z) - Sampling-aware Adversarial Attacks Against Large Language Models [52.30089653615172]
Existing adversarial attacks typically target harmful responses in single-point greedy generations.<n>We show that for the goal of eliciting harmful responses, repeated sampling of model outputs during the attack prompt optimization.<n>We show that integrating sampling into existing attacks boosts success rates by up to 37% and improves efficiency by up to two orders of magnitude.
arXiv Detail & Related papers (2025-07-06T16:13:33Z) - Reasoning-Augmented Conversation for Multi-Turn Jailbreak Attacks on Large Language Models [53.580928907886324]
Reasoning-Augmented Conversation is a novel multi-turn jailbreak framework.<n>It reformulates harmful queries into benign reasoning tasks.<n>We show that RACE achieves state-of-the-art attack effectiveness in complex conversational scenarios.
arXiv Detail & Related papers (2025-02-16T09:27:44Z) - Multi-granular Adversarial Attacks against Black-box Neural Ranking Models [111.58315434849047]
We create high-quality adversarial examples by incorporating multi-granular perturbations.
We transform the multi-granular attack into a sequential decision-making process.
Our attack method surpasses prevailing baselines in both attack effectiveness and imperceptibility.
arXiv Detail & Related papers (2024-04-02T02:08:29Z) - Multi-objective Evolutionary Search of Variable-length Composite
Semantic Perturbations [1.9100854225243937]
We propose a novel method called multi-objective evolutionary search of variable-length composite semantic perturbations (MES-VCSP)
MES-VCSP can obtain adversarial examples with a higher attack success rate, more naturalness, and less time cost.
arXiv Detail & Related papers (2023-07-13T04:08:16Z) - LEAT: Towards Robust Deepfake Disruption in Real-World Scenarios via
Latent Ensemble Attack [11.764601181046496]
Deepfakes, malicious visual contents created by generative models, pose an increasingly harmful threat to society.
To proactively mitigate deepfake damages, recent studies have employed adversarial perturbation to disrupt deepfake model outputs.
We propose a simple yet effective disruption method called Latent Ensemble ATtack (LEAT), which attacks the independent latent encoding process.
arXiv Detail & Related papers (2023-07-04T07:00:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.