Fugu-MT 論文翻訳(概要): VoxPrivacy: A Benchmark for Evaluating Interactional Privacy of Speech Language Models

論文の概要: VoxPrivacy: A Benchmark for Evaluating Interactional Privacy of Speech Language Models

arxiv url: http://arxiv.org/abs/2601.19956v1
Date: Tue, 27 Jan 2026 06:22:14 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-29 15:46:06.608333
Title: VoxPrivacy: A Benchmark for Evaluating Interactional Privacy of Speech Language Models
Title（参考訳）: VoxPrivacy: 音声モデルのインタラクションプライバシ評価のためのベンチマーク
Authors: Yuxiang Wang, Hongyu Liu, Dekun Chen, Xueyao Zhang, Zhizheng Wu,
Abstract要約: 音声言語モデル(SLM)は、ユーザが適切に情報の流れを管理するために区別されることが期待される。現在のSLMベンチマークでは、対話能力をテストするが、話者識別は見落としている。我々は、SLMにおけるインタラクションプライバシを評価するために設計された最初のベンチマークであるVoxPrivacyを紹介する。
参考スコア（独自算出の注目度）: 25.266028200777317
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As Speech Language Models (SLMs) transition from personal devices to shared, multi-user environments such as smart homes, a new challenge emerges: the model is expected to distinguish between users to manage information flow appropriately. Without this capability, an SLM could reveal one user's confidential schedule to another, a privacy failure we term interactional privacy. Thus, the ability to generate speaker-aware responses becomes essential for SLM safe deployment. Current SLM benchmarks test dialogue ability but overlook speaker identity. Multi-speaker benchmarks check who said what without assessing whether SLMs adapt their responses. Privacy benchmarks focus on globally sensitive data (e.g., bank passwords) while neglecting contextual privacy-sensitive information (e.g., a user's private appointment). To address this gap, we introduce VoxPrivacy, the first benchmark designed to evaluate interactional privacy in SLMs. VoxPrivacy spans three tiers of increasing difficulty, from following direct secrecy commands to proactively protecting privacy. Our evaluation of nine SLMs on a 32-hour bilingual dataset reveals a widespread vulnerability: most open-source models perform close to random chance (around 50% accuracy) on conditional privacy decisions, while even strong closed-source systems fall short on proactive privacy inference. We further validate these findings on Real-VoxPrivacy, a human-recorded subset, confirming that failures observed on synthetic data persist in real speech. Finally, we demonstrate a viable path forward: by fine-tuning on a new 4,000-hour training set, we improve privacy-preserving abilities while maintaining robustness. To support future work, we release the VoxPrivacy benchmark, the large-scale training set, and the fine-tuned model to foster the development of safer and more context-aware SLMs.
Abstract（参考訳）: 音声言語モデル(SLM)がパーソナルデバイスからスマートホームなどの共有マルチユーザ環境へと移行するにつれ、新たな課題が浮かび上がってくる。この機能がなければ、SLMはユーザーの秘密のスケジュールを他のユーザーに公開する可能性がある。したがって、SLMの安全な配置には、話者対応応答を生成する能力が不可欠である。現在のSLMベンチマークでは、対話能力をテストするが、話者識別は見落としている。マルチスピーカーベンチマークでは、SLMが応答に適応するかどうかを評価することなく、誰が何を言ったかをチェックする。プライバシーベンチマークは、グローバルなセンシティブなデータ(銀行のパスワードなど)にフォーカスすると同時に、コンテキストによるプライバシに敏感な情報(ユーザのプライベートアポイントメントなど)を無視している。このギャップに対処するために、SLMにおけるインタラクションプライバシを評価するために設計された最初のベンチマークであるVoxPrivacyを紹介する。 VoxPrivacyは、直接機密命令に従うことから、積極的にプライバシーを保護することまで、難易度が増大する3つの層にまたがっている。ほとんどのオープンソースモデルは、条件付きプライバシ決定においてランダムに(約50%の精度で)実行されますが、強力なクローズドソースシステムは、積極的なプライバシ推論では不足しています。さらに,実音声における合成データ上の障害が持続していることを確認する,人間記録サブセットであるReal-VoxPrivacyについて,これらの知見を検証した。 4,000時間の新たなトレーニングセットを微調整することで、堅牢性を維持しつつ、プライバシ保護能力を改善します。今後の作業を支援するため、我々はVoxPrivacyベンチマーク、大規模なトレーニングセット、そしてより安全でコンテキスト対応なSLMの開発を促進するための微調整モデルをリリースする。

論文の概要: VoxPrivacy: A Benchmark for Evaluating Interactional Privacy of Speech Language Models

関連論文リスト