Fugu-MT 論文翻訳(概要): AUDETER: A Large-scale Dataset for Deepfake Audio Detection in Open Worlds

論文の概要: AUDETER: A Large-scale Dataset for Deepfake Audio Detection in Open Worlds

arxiv url: http://arxiv.org/abs/2509.04345v1
Date: Thu, 04 Sep 2025 16:03:44 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-05 20:21:10.215585
Title: AUDETER: A Large-scale Dataset for Deepfake Audio Detection in Open Worlds
Title（参考訳）: AUDETER:オープンワールドにおけるディープフェイク音声検出のための大規模データセット
Authors: Qizhou Wang, Hanxun Huang, Guansong Pang, Sarah Erfani, Christopher Leckie,
Abstract要約: AUDETERは大規模で高度に多様なディープフェイクオーディオデータセットである。これは、最新のTSモデル11と、幅広いTS/vocoderパターンを持つ10のvocoderによって生成される4,500時間以上の合成オーディオで構成されている。大規模なディープフェイクオーディオデータセットとしては最大である。
参考スコア（独自算出の注目度）: 38.75029700407531
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Speech generation systems can produce remarkably realistic vocalisations that are often indistinguishable from human speech, posing significant authenticity challenges. Although numerous deepfake detection methods have been developed, their effectiveness in real-world environments remains unrealiable due to the domain shift between training and test samples arising from diverse human speech and fast evolving speech synthesis systems. This is not adequately addressed by current datasets, which lack real-world application challenges with diverse and up-to-date audios in both real and deep-fake categories. To fill this gap, we introduce AUDETER (AUdio DEepfake TEst Range), a large-scale, highly diverse deepfake audio dataset for comprehensive evaluation and robust development of generalised models for deepfake audio detection. It consists of over 4,500 hours of synthetic audio generated by 11 recent TTS models and 10 vocoders with a broad range of TTS/vocoder patterns, totalling 3 million audio clips, making it the largest deepfake audio dataset by scale. Through extensive experiments with AUDETER, we reveal that i) state-of-the-art (SOTA) methods trained on existing datasets struggle to generalise to novel deepfake audio samples and suffer from high false positive rates on unseen human voice, underscoring the need for a comprehensive dataset; and ii) these methods trained on AUDETER achieve highly generalised detection performance and significantly reduce detection error rate by 44.1% to 51.6%, achieving an error rate of only 4.17% on diverse cross-domain samples in the popular In-the-Wild dataset, paving the way for training generalist deepfake audio detectors. AUDETER is available on GitHub.
Abstract（参考訳）: 音声生成システムは、人間の音声と区別できないような、驚くほどリアルな発声を生成できる。多くのディープフェイク検出法が開発されているが、多種多様な音声から生じるトレーニングとテストサンプルのドメインシフトと、急速に進化する音声合成システムにより、実際の環境におけるその効果は実現不可能である。これは現在のデータセットでは適切に対処されていない。リアルタイムとディープフェイクの両方のカテゴリにおいて、多種多様な最新のオーディオによって、現実のアプリケーションの課題が欠如している。このギャップを埋めるために,我々は大規模かつ多種多様なディープフェイク音声データセットであるAUDETER(AUdio Deepfake TEst Range)を紹介した。それは、11の最近のTSモデルによって生成された4500時間以上の合成オーディオと、幅広いTS/ヴォコーダパターンを持つ10のボコーダで構成され、合計300万のオーディオクリップで構成されており、大規模なディープフェイクオーディオデータセットとしては最大である。 AUDETERによる広範な実験を通して、我々はそれを明らかにした。一既存のデータセットに基づいて訓練された最先端の方法(SOTA)は、新しいディープフェイク音声サンプルの一般化に苦慮し、目に見えない人間の声に対して高い偽陽性率を被り、包括的データセットの必要性を軽視すること。二 AUDETER で訓練されたこれらの手法は、高度に一般化された検出性能を達成し、検出エラー率を44.1%から51.6%まで大幅に低減し、一般的な In-the-Wild データセットにおける多様なクロスドメインサンプルに対してわずか4.17%の誤差率を達成し、一般のディープフェイクオーディオ検出器の訓練の道を開く。 AUDETERはGitHubで入手できる。

論文の概要: AUDETER: A Large-scale Dataset for Deepfake Audio Detection in Open Worlds

関連論文リスト