Fugu-MT 論文翻訳(概要): Protecting Bystander Privacy via Selective Hearing in LALMs

論文の概要: Protecting Bystander Privacy via Selective Hearing in LALMs

arxiv url: http://arxiv.org/abs/2512.06380v1
Date: Sat, 06 Dec 2025 10:24:04 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-09 22:03:54.333066
Title: Protecting Bystander Privacy via Selective Hearing in LALMs
Title（参考訳）: LALMにおける選択的聴覚による傍観者のプライバシー保護
Authors: Xiao Zhan, Guangzhi Sun, Jose Such, Phil Woodland,
Abstract要約: 大規模オーディオ言語モデル(LALM)は、意図しない近隣の傍観者からの音声を必然的に捕捉する現実世界の環境にますます配備されている。選択的聴覚評価のための最初のベンチマークであるSH-Benchを紹介する。本稿では,多話者理解と傍観者プライバシ保護を両立させる統一的尺度である選択効力度(SE)を提案する。
参考スコア（独自算出の注目度）: 14.82452941000742
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large audio language models (LALMs) are increasingly deployed in real-world settings where they inevitably capture speech from unintended nearby bystanders, raising privacy risks that existing benchmarks and defences largely overlook. We introduce SH-Bench, the first benchmark designed to evaluate selective hearing: a model's ability to attend to an intended main speaker while refusing to process or reveal information about incidental bystander speech. SH-Bench contains 3,968 multi-speaker audio mixtures spanning both real-world and synthetic scenarios, paired with 77k multiple-choice questions that probe models under general and selective operating modes. We propose Selective Efficacy (SE), a unified metric capturing both multi-speaker comprehension and bystander-privacy protection. Our evaluation of state-of-the-art open-source and proprietary LALMs reveals substantial privacy leakage, with strong audio understanding failing to translate into selective protection of bystander privacy. To mitigate this gap, we introduce Bystander Privacy Fine-Tuning (BPFT), a training pipeline that teaches models to refuse bystander-related queries without degrading main-speaker comprehension. BPFT yields substantial gains which improve SE by up to 15.9% over Gemini 2.5 Pro, demonstrating that selective hearing is learnable but far from achieved in current LALMs. SH-Bench and BPFT provide the first systematic framework for measuring and improving bystander privacy in audio foundation models.
Abstract（参考訳）: 大規模オーディオ言語モデル(LALM)は、意図しない近隣の傍観者からのスピーチを必然的に捉え、既存のベンチマークや防衛がほとんど見落としているようなプライバシーリスクを増大させる、現実の環境でますます展開されている。 SH-Benchは、選択的聴覚を評価するために設計された最初のベンチマークであり、モデルが意図したメインスピーカーに出席する能力を持ちながら、偶発的傍観者の発話に関する情報の処理や開示を拒否する能力である。 SH-Benchには、実世界のシナリオと合成シナリオの両方にまたがる3,968のマルチスピーカーオーディオミックスが含まれており、一般的な操作モードと選択的操作モードの下でモデルを探索する77kのマルチチョイス質問と組み合わせている。本稿では,多話者理解と傍観者プライバシ保護を両立させる統一的尺度である選択効力度(SE)を提案する。最先端のオープンソースおよびプロプライエタリなLALMに対する我々の評価は、強力な音声理解が傍観者のプライバシーを選択的に保護するのに失敗し、かなりのプライバシー漏洩を明らかにしている。このギャップを軽減するために、メインスピーカーの理解を損なうことなく、傍観者関連のクエリを拒否するモデルを教えるトレーニングパイプラインであるBystander Privacy Fine-Tuning(BPFT)を導入する。 BPFTは、Gemini 2.5 ProよりもSEを最大15.9%向上させ、選択的聴力は学習可能であるが、現在のLALMでは達成できないことを示した。 SH-BenchとBPFTは、オーディオ基礎モデルの傍観者のプライバシーを測定し改善するための最初の体系的なフレームワークを提供する。

論文の概要: Protecting Bystander Privacy via Selective Hearing in LALMs

関連論文リスト