Fugu-MT 論文翻訳(概要): Audio Deepfake Detection in the Age of Advanced Text-to-Speech models

論文の概要: Audio Deepfake Detection in the Age of Advanced Text-to-Speech models

arxiv url: http://arxiv.org/abs/2601.20510v1
Date: Wed, 28 Jan 2026 11:39:40 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-29 15:46:06.912847
Title: Audio Deepfake Detection in the Age of Advanced Text-to-Speech models
Title（参考訳）: 音声のディープフェイク検出による音声音声合成モデルの構築
Authors: Robin Singh, Aditya Yogesh Nair, Fabio Palumbo, Florian Barbaro, Anna Dyka, Lohith Rachakonda,
Abstract要約: テキスト音声合成システム(TTS)の最近の進歩は,合成音声のリアリズムを著しく高めている。テキスト音声合成システム(TTS)の最近の進歩は,合成音声のリアリズムを著しく高めている。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in Text-to-Speech (TTS) systems have substantially increased the realism of synthetic speech, raising new challenges for audio deepfake detection. This work presents a comparative evaluation of three state-of-the-art TTS models--Dia2, Maya1, and MeloTTS--representing streaming, LLM-based, and non-autoregressive architectures. A corpus of 12,000 synthetic audio samples was generated using the Daily-Dialog dataset and evaluated against four detection frameworks, including semantic, structural, and signal-level approaches. The results reveal significant variability in detector performance across generative mechanisms: models effective against one TTS architecture may fail against others, particularly LLM-based synthesis. In contrast, a multi-view detection approach combining complementary analysis levels demonstrates robust performance across all evaluated models. These findings highlight the limitations of single-paradigm detectors and emphasize the necessity of integrated detection strategies to address the evolving landscape of audio deepfake threats.
Abstract（参考訳）: 最近のTTS(Text-to-Speech)システムの進歩は、合成音声の現実性を大幅に向上させ、オーディオディープフェイク検出の新たな課題を提起している。本研究は,Dia2,Maya1,MeloTTSの3つの最先端TSモデルの比較評価を行った。 12,000の音声サンプルのコーパスをDaily-Dialogデータセットを用いて生成し,意味的,構造的,信号レベルの4つの検出フレームワークに対して評価した。この結果から, あるTSアーキテクチャに対して有効なモデルが他のモデル, 特にLDMベースの合成に対して失敗する可能性があることが示唆された。対照的に、相補的な分析レベルを組み合わせた多視点検出手法は、すべての評価モデルに対して堅牢な性能を示す。これらの知見は、単一パラダイム検出器の限界を強調し、オーディオディープフェイクの脅威の進化に対処するための統合検出戦略の必要性を強調した。

論文の概要: Audio Deepfake Detection in the Age of Advanced Text-to-Speech models

関連論文リスト