Fugu-MT 論文翻訳(概要): Towards Authentic Movie Dubbing with Retrieve-Augmented Director-Actor Interaction Learning

論文の概要: Towards Authentic Movie Dubbing with Retrieve-Augmented Director-Actor Interaction Learning

arxiv url: http://arxiv.org/abs/2511.14249v1
Date: Tue, 18 Nov 2025 08:39:44 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-19 16:23:53.012786
Title: Towards Authentic Movie Dubbing with Retrieve-Augmented Director-Actor Interaction Learning
Title（参考訳）: Retrieve-Augmented Director-Actor Interaction Learning を用いた正当性映画ダビングに向けて
Authors: Rui Liu, Yuan Zhao, Zhenqi Jia,
Abstract要約: 本稿では,映画ダビングを実現するための新たなRetrieve-Augmented Director-Actor Interaction Learningスキームを提案する。我々は,監督が提供した学習映像をシミュレートするマルチモーダル参照フットジュライブラリを構築した。 Emotion-Similarityに基づくRetrieval-Augmentation戦略は、ターゲットのサイレントビデオと整合する最も関連性の高いマルチモーダル情報を取得する。
参考スコア（独自算出の注目度）: 11.98494175240752
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The automatic movie dubbing model generates vivid speech from given scripts, replicating a speaker's timbre from a brief timbre prompt while ensuring lip-sync with the silent video. Existing approaches simulate a simplified workflow where actors dub directly without preparation, overlooking the critical director-actor interaction. In contrast, authentic workflows involve a dynamic collaboration: directors actively engage with actors, guiding them to internalize the context cues, specifically emotion, before performance. To address this issue, we propose a new Retrieve-Augmented Director-Actor Interaction Learning scheme to achieve authentic movie dubbing, termed Authentic-Dubber, which contains three novel mechanisms: (1) We construct a multimodal Reference Footage library to simulate the learning footage provided by directors. Note that we integrate Large Language Models (LLMs) to achieve deep comprehension of emotional representations across multimodal signals. (2) To emulate how actors efficiently and comprehensively internalize director-provided footage during dubbing, we propose an Emotion-Similarity-based Retrieval-Augmentation strategy. This strategy retrieves the most relevant multimodal information that aligns with the target silent video. (3) We develop a Progressive Graph-based speech generation approach that incrementally incorporates the retrieved multimodal emotional knowledge, thereby simulating the actor's final dubbing process. The above mechanisms enable the Authentic-Dubber to faithfully replicate the authentic dubbing workflow, achieving comprehensive improvements in emotional expressiveness. Both subjective and objective evaluations on the V2C Animation benchmark dataset validate the effectiveness. The code and demos are available at https://github.com/AI-S2-Lab/Authentic-Dubber.
Abstract（参考訳）: 自動映画ダビングモデルは、与えられたスクリプトから鮮やかな音声を生成し、サイレントビデオとのリップシンクを確保しつつ、短い音程プロンプトから話者の音色を複製する。既存のアプローチは、アクターが準備なしで直接ダブする単純化されたワークフローをシミュレートする。これとは対照的に、真のワークフローにはダイナミックなコラボレーションが含まれる。ディレクターはアクターと積極的に関わり、パフォーマンスの前にコンテキスト、特に感情を内部化する。この問題に対処するため,我々は,(1)監督が提供した学習映像をシミュレートするマルチモーダル・リファレンス・フットージ・ライブラリを構築した,認証・ダビング(Authentic-Dubber)と呼ばれる3つの新しいメカニズムを含む,真の映画ダビングを実現するための新たなRetrieve-Augmented Director-Actor Interaction Learningスキームを提案する。マルチモーダル信号間の感情表現の深い理解を実現するために,Large Language Models (LLMs) を統合した。 2)ダビング中に監督が提供した映像を効率よく包括的に内部化する方法をエミュレートするために,情緒相似性に基づく検索・拡張戦略を提案する。この戦略は、対象のサイレントビデオと一致した最も関連性の高いマルチモーダル情報を取得する。 3) 得られたマルチモーダルな感情的知識を漸進的に取り入れたプログレッシブグラフに基づく音声生成手法を開発し,俳優の最終的なダビング過程をシミュレートする。上記のメカニズムにより、Authentic-Dubberは、真のダビングワークフローを忠実に再現することができ、感情表現性の包括的な改善を達成できる。 V2Cアニメーションベンチマークデータセットの主観的および客観的評価は、その有効性を検証する。コードとデモはhttps://github.com/AI-S2-Lab/Authentic-Dubber.comで公開されている。

論文の概要: Towards Authentic Movie Dubbing with Retrieve-Augmented Director-Actor Interaction Learning

関連論文リスト