Fugu-MT 論文翻訳(概要): SeqBench: Benchmarking Sequential Narrative Generation in Text-to-Video Models

論文の概要: SeqBench: Benchmarking Sequential Narrative Generation in Text-to-Video Models

arxiv url: http://arxiv.org/abs/2510.13042v1
Date: Tue, 14 Oct 2025 23:40:57 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-16 20:13:28.446374
Title: SeqBench: Benchmarking Sequential Narrative Generation in Text-to-Video Models
Title（参考訳）: SeqBench: テキスト対ビデオモデルにおける逐次ナラティブ生成のベンチマーク
Authors: Zhengxu Tang, Zizheng Wang, Luning Wang, Zitao Shuai, Chenhao Zhang, Siyu Qian, Yirui Wu, Bohao Wang, Haosong Rao, Zhenyu Yang, Chenwei Wu,
Abstract要約: 本稿では,T2V生成における逐次的物語コヒーレンスを評価するための総合ベンチマークであるSeqBenchを紹介する。私たちは、さまざまな物語の複雑さにまたがる320のプロンプトのデータセットを使用しています。我々のDTG基準は人間のアノテーションと強い相関を示す。
参考スコア（独自算出の注目度）: 9.237220559112837
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Text-to-video (T2V) generation models have made significant progress in creating visually appealing videos. However, they struggle with generating coherent sequential narratives that require logical progression through multiple events. Existing T2V benchmarks primarily focus on visual quality metrics but fail to evaluate narrative coherence over extended sequences. To bridge this gap, we present SeqBench, a comprehensive benchmark for evaluating sequential narrative coherence in T2V generation. SeqBench includes a carefully designed dataset of 320 prompts spanning various narrative complexities, with 2,560 human-annotated videos generated from 8 state-of-the-art T2V models. Additionally, we design a Dynamic Temporal Graphs (DTG)-based automatic evaluation metric, which can efficiently capture long-range dependencies and temporal ordering while maintaining computational efficiency. Our DTG-based metric demonstrates a strong correlation with human annotations. Through systematic evaluation using SeqBench, we reveal critical limitations in current T2V models: failure to maintain consistent object states across multi-action sequences, physically implausible results in multi-object scenarios, and difficulties in preserving realistic timing and ordering relationships between sequential actions. SeqBench provides the first systematic framework for evaluating narrative coherence in T2V generation and offers concrete insights for improving sequential reasoning capabilities in future models. Please refer to https://videobench.github.io/SeqBench.github.io/ for more details.
Abstract（参考訳）: テキスト・ツー・ビデオ(T2V)生成モデルは、視覚的に魅力的なビデオを作成する上で大きな進歩を遂げている。しかし、複数の出来事を通じて論理的な進行を必要とする一貫性のある逐次的な物語を生み出すのに苦労している。既存のT2Vベンチマークは主に視覚的品質の指標に焦点を当てているが、拡張シーケンスよりも物語のコヒーレンスを評価することができない。このギャップを埋めるため、T2V生成における逐次的物語コヒーレンスを評価するための総合的なベンチマークであるSeqBenchを提案する。 SeqBenchには、さまざまな物語の複雑さにまたがる320のプロンプトの、慎重に設計されたデータセットが含まれている。さらに, 動的時間グラフ(DTG)に基づく自動評価尺度を設計し, 計算効率を維持しながら, 時間的依存や時間的依存を効率的に把握する。我々のDTG基準は人間のアノテーションと強い相関を示す。 SeqBenchを用いた体系的な評価により、現在のT2Vモデルにおいて、多アクションシーケンス間の一貫性のあるオブジェクト状態の維持の失敗、多オブジェクトシナリオにおける物理的に不確実な結果、現実的なタイミングの保存の困難、シーケンシャルアクション間の関係の順序付けといった重要な制限を明らかにした。 SeqBenchは、T2V生成における物語コヒーレンスを評価するための最初の体系的なフレームワークを提供し、将来のモデルにおけるシーケンシャルな推論能力を改善するための具体的な洞察を提供する。詳細はhttps://videobench.github.io/SeqBench.github.io/を参照してください。

論文の概要: SeqBench: Benchmarking Sequential Narrative Generation in Text-to-Video Models

関連論文リスト