Fugu-MT 論文翻訳(概要): Audio Jailbreaks in Large Audio-Language Models: Taxonomy, Attack-Defense Analysis, and Cost-Aware Evaluation

論文の概要: Audio Jailbreaks in Large Audio-Language Models: Taxonomy, Attack-Defense Analysis, and Cost-Aware Evaluation

arxiv url: http://arxiv.org/abs/2605.30031v1
Date: Thu, 28 May 2026 14:53:27 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-30 02:45:56.404378
Title: Audio Jailbreaks in Large Audio-Language Models: Taxonomy, Attack-Defense Analysis, and Cost-Aware Evaluation
Title（参考訳）: 大規模オーディオ言語モデルにおけるオーディオ・ジェイルブレイク:分類学、アタック・ディフェンス分析、コスト・アウェア・アセスメント
Authors: Bo-Han Feng, Yu-Hsuan Li Liang, Chien-Feng Liu, You-Hsuan Chang, Yun-Nung Chen,
Abstract要約: 大規模オーディオ言語モデル(LALM)は、トークンレベルのプロンプトから、完全な音声認識から推論パイプラインまで、ジェイルブレイクのリスクを拡大する。本稿では,LALMジェイルブレイク攻撃と防衛の統一された分類法と制御された実証的評価を提供する。
参考スコア（独自算出の注目度）: 16.72528638767562
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Audio Language Models (LALMs) expand jailbreak risks from token-level prompting to the full speech perception-to-reasoning pipeline, where unsafe behavior can be induced through semantics, acoustic style, signal artifacts, or internal representations. Existing work studies these risks under heterogeneous threat models and evaluation protocols, making it difficult to compare attack practicality or defense utility. This paper provides a unified taxonomy and a controlled empirical evaluation of LALM jailbreak attacks and defenses. We organize prior work into semantic, acoustic, signal, and embedding-layer attacks; guard-based, training-free, and training-based defenses; and cross-modal, audio-native, and interactive benchmarks. We then evaluate representative attacks and defenses across ten open-source LALMs, measuring not only attack success rate but also benign refusal and latency. Our results show that Acoustic Best-of-N reveals strong worst-case audio-space vulnerabilities, Narrative Framing is an effective low-latency semantic threat, and current defenses trade robustness against benign usability. These findings support cost- and utility-aware evaluation as a necessary complement to success-rate-only LALM safety benchmarks.
Abstract（参考訳）: 大規模オーディオ言語モデル(LALM)は、トークンレベルのプロンプトから、セマンティクス、音響スタイル、信号アーティファクト、あるいは内部表現を通じて、安全でない振る舞いを誘発できる完全な音声知覚対推論パイプラインまで、ジェイルブレイクのリスクを拡大する。既存の研究は、これらのリスクを異種脅威モデルと評価プロトコルの下で研究しており、攻撃実践性や防衛ユーティリティを比較することは困難である。本稿では,LALMジェイルブレイク攻撃と防衛の統一された分類法と制御された実証的評価を提供する。事前の作業は、セマンティック、アコースティック、シグナル、埋め込み層攻撃、ガードベース、トレーニングフリー、トレーニングベースディフェンス、クロスモーダル、オーディオネイティブ、インタラクティブなベンチマークにまとめる。次に,10個のオープンソースLALMに対する代表的攻撃と防御を評価し,攻撃成功率だけでなく,拒否や遅延も測定した。以上の結果から,アコースティック・ベスト・オブ・Nは音声空間の脆弱性が強く,ナラティブ・フレーミングは効果的な低レイテンシ・セマンティック・脅威であり,現在の防衛力はユーザビリティに対するロバスト性を損なうことが示唆された。これらの知見は、成功率のみのLALM安全性ベンチマークを補完するために必要なコスト・ユーティリティ・アウェア評価を支援する。

論文の概要: Audio Jailbreaks in Large Audio-Language Models: Taxonomy, Attack-Defense Analysis, and Cost-Aware Evaluation

関連論文リスト