Fugu-MT 論文翻訳(概要): Benchmarking Fake Voice Detection in the Fake Voice Generation Arms Race

論文の概要: Benchmarking Fake Voice Detection in the Fake Voice Generation Arms Race

arxiv url: http://arxiv.org/abs/2510.06544v2
Date: Fri, 17 Oct 2025 03:17:02 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-20 13:49:08.749185
Title: Benchmarking Fake Voice Detection in the Fake Voice Generation Arms Race
Title（参考訳）: フェイク音声発生器レースにおけるフェイク音声検出のベンチマーク
Authors: Xutao Mao, Ke Li, Cameron Baird, Ezra Xuanru Tao, Dan Lin,
Abstract要約: 既存のベンチマークでは、さまざまな偽音声サンプルを単一のデータセットに集約して評価する。このプラクティスは、メソッド固有のアーティファクトをマスクし、異なる世代パラダイムに対する検出器のさまざまなパフォーマンスを隠蔽する。我々は,新しい1対1評価プロトコルによって17の最先端の偽音声生成装置と8つの先行検知器の相互作用を体系的に評価する,最初のエコシステムレベルのベンチマークを導入する。
参考スコア（独自算出の注目度）: 5.051497895059242
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The rapid advancement of fake voice generation technology has ignited a race with detection systems, creating an urgent need to secure the audio ecosystem. However, existing benchmarks suffer from a critical limitation: they typically aggregate diverse fake voice samples into a single dataset for evaluation. This practice masks method-specific artifacts and obscures the varying performance of detectors against different generation paradigms, preventing a nuanced understanding of their true vulnerabilities. To address this gap, we introduce the first ecosystem-level benchmark that systematically evaluates the interplay between 17 state-of-the-art fake voice generators and 8 leading detectors through a novel one-to-one evaluation protocol. This fine-grained analysis exposes previously hidden vulnerabilities and sensitivities that are missed by traditional aggregated testing. We also propose unified scoring systems to quantify both the evasiveness of generators and the robustness of detectors, enabling fair and direct comparisons. Our extensive cross-domain evaluation reveals that modern generators, particularly those based on neural audio codecs and flow matching, consistently evade top-tier detectors. We found that no single detector is universally robust; their effectiveness varies dramatically depending on the generator's architecture, highlighting a significant generalization gap in current defenses. This work provides a more realistic assessment of the threat landscape and offers actionable insights for building the next generation of detection systems.
Abstract（参考訳）: フェイク音声生成技術の急速な進歩は、検出システムとの競争に火をつけ、オーディオエコシステムの確保に緊急の必要性を生み出した。しかしながら、既存のベンチマークでは、さまざまな偽音声サンプルを単一のデータセットに集約して評価する、という重大な制限が課されている。このプラクティスは、メソッド固有のアーティファクトを隠蔽し、異なる世代パラダイムに対する検出器のさまざまなパフォーマンスを隠蔽し、真の脆弱性の微妙な理解を妨げる。このギャップに対処するために,我々は,新しい1対1評価プロトコルによって17の最先端の偽音声生成装置と8つの先行検知器の相互作用を体系的に評価する,最初のエコシステムレベルのベンチマークを導入する。このきめ細かい分析は、従来の集計テストで見逃された、これまで隠されていた脆弱性と感受性を明らかにする。また,発電機の回避性と検出器の堅牢性の両方を定量化する統合スコアリングシステムを提案し,公正かつ直接的な比較を可能にした。我々のクロスドメイン評価は、現代のジェネレータ、特にニューラルオーディオコーデックとフローマッチングに基づくものが、トップ層検出器を一貫して回避していることを示している。一つの検出器が普遍的に堅牢であることは見出され、その有効性はジェネレータのアーキテクチャによって劇的に変化し、現在の防御における大きな一般化のギャップを浮き彫りにしている。この研究は脅威の風景をより現実的に評価し、次世代の検知システムを構築するための実用的な洞察を提供する。

論文の概要: Benchmarking Fake Voice Detection in the Fake Voice Generation Arms Race

関連論文リスト