Fugu-MT 論文翻訳(概要): Prompt Relay: Inference-Time Temporal Control for Multi-Event Video Generation

論文の概要: Prompt Relay: Inference-Time Temporal Control for Multi-Event Video Generation

arxiv url: http://arxiv.org/abs/2604.10030v1
Date: Sat, 11 Apr 2026 04:59:06 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-14 20:13:15.800389
Title: Prompt Relay: Inference-Time Temporal Control for Multi-Event Video Generation
Title（参考訳）: Prompt Relay:マルチイベントビデオ生成のための推論時間時間時間制御
Authors: Gordon Chen, Ziqi Huang, Ziwei Liu,
Abstract要約: Inference-time, plug-and-play法であるPrompt Relayを提案する。 Prompt Relayは、各時間セグメントが割り当てられたプロンプトにのみ参加するように、クロスアテンションメカニズムにペナルティを導入する。
参考スコア（独自算出の注目度）: 40.694968116482315
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Video diffusion models have achieved remarkable progress in generating high-quality videos. However, these models struggle to represent the temporal succession of multiple events in real-world videos and lack explicit mechanisms to control when semantic concepts appear, how long they persist, and the order in which multiple events occur. Such control is especially important for movie-grade video synthesis, where coherent storytelling depends on precise timing, duration, and transitions between events. When using a single paragraph-style prompt to describe a sequence of complex events, models often exhibit semantic entanglement, where concepts intended for different moments in the video bleed into one another, resulting in poor text-video alignment. To address these limitations, we propose Prompt Relay, an inference-time, plug-and-play method to enable fine-grained temporal control in multi-event video generation, requiring no architectural modifications and no additional computational overhead. Prompt Relay introduces a penalty into the cross-attention mechanism, so that each temporal segment attends only to its assigned prompt, allowing the model to represent one semantic concept at a time and thereby improving temporal prompt alignment, reducing semantic interference, and enhancing visual quality.
Abstract（参考訳）: ビデオ拡散モデルは高品質のビデオの生成において顕著な進歩を遂げた。しかしながら、これらのモデルは、実世界のビデオにおける複数のイベントの時間的継承を表現するのに苦労し、セマンティックな概念がいつ現れるか、いつまで持続するか、そして複数のイベントが発生する順序を制御するための明確なメカニズムを欠いている。このような制御は、コヒーレントなストーリーテリングがイベント間の正確なタイミング、時間、遷移に依存する映画レベルのビデオ合成において特に重要である。複雑な出来事の列を記述するために単一の段落スタイルのプロンプトを使用する場合、モデルはしばしば意味的な絡み合いを示し、そこではビデオの異なる瞬間を意図した概念が互いに吹き込まれ、結果としてテキストとビデオのアライメントが低下する。これらの制約に対処するために,マルチイベントビデオ生成における微粒な時間制御を実現するための,推論時プラグアンドプレイ方式であるPrompt Relayを提案する。 Prompt Relayは、相互注意機構にペナルティを導入し、各時間セグメントが割り当てられたプロンプトにのみ参加できるようにし、モデルが一度に1つのセマンティックな概念を表現できるようにし、時間的なプロンプトアライメントを改善し、セマンティックな干渉を低減し、視覚的品質を向上させる。

論文の概要: Prompt Relay: Inference-Time Temporal Control for Multi-Event Video Generation

関連論文リスト