Fugu-MT 論文翻訳(概要): Voice ''Cloning'' is Style Transfer

論文の概要: Voice ''Cloning'' is Style Transfer

arxiv url: http://arxiv.org/abs/2605.16578v2
Date: Wed, 20 May 2026 16:52:07 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-21 14:55:44.219244
Title: Voice ''Cloning'' is Style Transfer
Title（参考訳）: Voice 'Cloning'はスタイルトランスファーである
Authors: Kaitlyn Zhou, Federico Bianchi, Martijn Bartelds, Anna Pot, Yongchan Kwon, James Zou,
Abstract要約: この言葉にもかかわらず、音声のクローン化は個人の声を忠実に「クローン」するものではないことを示す。広範に使用されている音声クローンモデルは,ソース音声へのスタイル転送を体系的に適用している。人間のアノテータによって評価されるように、クローンされた音声は、より権威的で、温かく、カスタマーサービス風で、人間に似たものとして認識される。
参考スコア（独自算出の注目度）: 35.849322148450604
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Artificially generated speech is increasingly embedded in everyday life. Voice cloning in particular enables applications where identity preservation is important, such as completing a recording, dubbing in a new language, or preserving the voices of individuals with speech loss. However, in our work, we find that despite the term, voice cloning does not faithfully ''clone'' an individual's voice. Instead, we find that widely-used voice cloning models systematically apply style transfer to source voices. As rated by human annotators, cloned voices are perceived as more authoritative, warm, customer-service-like, and human-like compared to their sources. Human annotators also report greater trust in cloned voices than source voices, and a greater willingness to disclose sensitive personal information to them. Our work furthermore shows that voice cloning leads to homogenization of speaker characteristics, as measured by reduced variance in accent, speaking rate, and the audio embedding space. Together, our results highlight a new set of limitations and risks of voice cloning technology and their potential impact on human behavior.
Abstract（参考訳）: 人工音声は日々の生活に埋もれている。特に音声のクローン化は、録音の完了、新しい言語のダビング、音声ロスのある個人の声の保存など、アイデンティティの保存が重要であるアプリケーションを可能にする。しかし,我々の研究では,音声のクローン化という用語は個人の声を忠実に「クローン」するものではないことが判明した。代わりに、広く使われている音声クローンモデルは、ソース音声へのスタイル転送を体系的に適用している。人間のアノテータによって評価されているように、クローンされた音声は、ソースよりも権威的で、温かくて、カスタマーサービス風で、人間のようなものだと見なされる。人間のアノテータはまた、ソース音声よりもクローン音声への信頼が高まり、機密性の高い個人情報を開示する意思が高まることを報告している。さらに我々の研究は、アクセントのばらつき、発声率、音声埋め込み空間のばらつきを減らして、音声のクローン化が話者特性の均質化につながることを示す。この結果から,音声クローニング技術の新たな限界とリスクと,人間の行動に対する潜在的な影響が明らかになった。

論文の概要: Voice ''Cloning'' is Style Transfer

関連論文リスト