Fugu-MT 論文翻訳(概要): AOR-Bench: Do Large Audio Language Models Over-Refuse Pseudo-Harmful Queries?

論文の概要: AOR-Bench: Do Large Audio Language Models Over-Refuse Pseudo-Harmful Queries?

arxiv url: http://arxiv.org/abs/2606.21147v1
Date: Fri, 19 Jun 2026 06:37:32 GMT
ステータス: 情報取得中
システム内更新日: 2026-06-23 11:29:56.83954
Title: AOR-Bench: Do Large Audio Language Models Over-Refuse Pseudo-Harmful Queries?
Title（参考訳）: AOR-Bench: 大規模なオーディオ言語モデルは擬似有害なクエリを再利用するのか?
Authors: Jiaxi Yang, Chaewan Chun, Jason Lucas, Yuchen Yang, Dongwon Lee,
Abstract要約: LALM(Large Audio Language Models)は、幅広いオーディオタスクにおいて強力なパフォーマンスを示す。 refusalメカニズムは、モデルが誤って良性クエリを拒否するEm over-refusalにつながる可能性がある。 AOR-BenchはLALM向けに特別に設計されたオーバーリフレクションのための最初のベンチマークである。
参考スコア（独自算出の注目度）: 15.494511369878042
License:
Abstract: Large Audio Language Models (LALMs) have demonstrated strong performance across a wide range of audio tasks. As they are increasingly deployed in real-world applications, ensuring their safety alignment has become more important. Although refusal mechanisms serve as a key safeguard by preventing LALMs from responding to harmful requests, they can also lead to {\em over-refusal}, where models incorrectly reject benign queries. This issue is especially challenging in the audio domain because speech that appears harmful in isolation may become benign when interpreted together with the surrounding acoustic context, such as background sounds. To study this problem, we introduce \textbf{AOR-Bench} (\textbf{A}udio \textbf{O}ver-\textbf{R}efusal \textbf{Bench}mark), the first benchmark for over-refusal specifically designed for LALMs. AOR-Bench contains 3,000 pseudo-harmful audio samples across six scenario categories. Evaluating 12 representative LALMs from six major model families, we find that over-refusal is widespread (Figure~\ref{fig:overall_performance}) and uncover several important patterns in their safety judgments. As a preliminary effort to mitigate this issue, we further explore two lightweight strategies (e.g., Chain-of-Thought and activation steering) to reduce over-refusal.
Abstract（参考訳）: LALM(Large Audio Language Models)は、幅広いオーディオタスクにおいて強力なパフォーマンスを示す。現実世界のアプリケーションにますますデプロイされるにつれて、安全性の確保がますます重要になっている。リファイン機構は、LALMが有害な要求に応答することを防ぐことでキーセーフガードとして機能するが、モデルが誤って良質なクエリを拒否する {\em over-refusal} につながることもある。この問題は音声領域では特に困難であり、背景音などの周囲の音環境と解釈すると、孤立して有害な音声が良性になる可能性がある。この問題を解決するために、LALM向けに特別に設計されたオーバーリファリングのための最初のベンチマークである \textbf{AOR-Bench} (\textbf{A}udio \textbf{O}ver-\textbf{R}efusal \textbf{Bench}mark) を導入する。 AOR-Benchには、6つのシナリオカテゴリにわたる3000の擬似調和オーディオサンプルが含まれている。 6大モデルファミリーから12のLALMを評価したところ、オーバーリフレルが広範に存在し(図)、安全性判断におけるいくつかの重要なパターンが明らかにされている。この問題を軽減するための予備的な取り組みとして、過剰な拒絶を減らすための2つの軽量戦略(例えば、Chain-of-ThoughtとActivation steering)について検討する。

論文の概要: AOR-Bench: Do Large Audio Language Models Over-Refuse Pseudo-Harmful Queries?

関連論文リスト