Fugu-MT 論文翻訳(概要): LogiStory: A Logic-Aware Framework for Multi-Image Story Visualization

論文の概要: LogiStory: A Logic-Aware Framework for Multi-Image Story Visualization

arxiv url: http://arxiv.org/abs/2603.28082v1
Date: Mon, 30 Mar 2026 06:37:12 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-31 23:18:45.261153
Title: LogiStory: A Logic-Aware Framework for Multi-Image Story Visualization
Title（参考訳）: LogiStory:マルチイメージストーリー可視化のためのロジック対応フレームワーク
Authors: Chutian Meng, Fan Ma, Chi Zhang, Jiaxu Miao, Yi Yang, Yueting Zhuang,
Abstract要約: 論理を意識したマルチイメージストーリー可視化フレームワークLogiStoryを提案する。このフレームワークは、ストーリービジュアライゼーションにおけるビジュアルロジックを明示的にモデル化する中心的なイノベーションに基づいて構築されている。この研究は、一般的な画像シーケンスおよびビデオ生成タスクにおける視覚ロジックのモデリングと強化に向けた基礎的なステップを提供する。
参考スコア（独自算出の注目度）: 59.35938978648807
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Generating coherent and communicative visual sequences, such as image sequences and videos, remains a significant challenge for current multimodal systems. Despite advances in visual quality and the integration of world knowledge, existing models still struggle to maintain logical flow, often resulting in disjointed actions, fragmented narratives, and unclear storylines. We attribute these issues to the lack of attention to visual logic, a critical yet underexplored dimension of visual sequence generation that we define as the perceptual and causal coherence among characters, actions, and scenes over time. To bridge this gap, we propose a logic-aware multi-image story visualization framework, LogiStory. The framework is built around the central innovation of explicitly modeling visual logic in story visualization. To realize this idea, we design a multi-agent system that grounds roles, extracts causal chains, and verifies story-level consistency, transforming narrative coherence from an implicit byproduct of image generation into an explicit modeling objective. This design effectively bridges structured story planning with visual generation, enhancing both narrative clarity and visual quality in story visualization. Furthermore, to evaluate the generation capacity, we construct LogicTale, a benchmark comprising richly annotated stories, emphasizing causal reasoning, and visual logic interpretability. We establish comprehensive automatic and human evaluation protocols designed to measure both visual logic and perceptual quality. Experiments demonstrate that our approach significantly improves the narrative logic of generated visual stories. This work provides a foundational step towards modeling and enforcing visual logic in general image sequence and video generation tasks.
Abstract（参考訳）: 画像シーケンスやビデオなどのコヒーレントでコミュニケートな視覚シーケンスを生成することは、現在のマルチモーダルシステムにとって重要な課題である。視覚的品質の進歩と世界知識の統合にもかかわらず、既存のモデルは論理フローの維持に苦慮し、しばしば不合理な行動、断片化された物語、不明瞭なストーリーラインをもたらす。これらの問題は、時間とともに文字、行動、シーン間の知覚的および因果的コヒーレンスとして定義する、視覚的論理に注意が払われていないことによる。このギャップを埋めるため、ロジック対応のマルチイメージ・ストーリー可視化フレームワークLogiStoryを提案する。このフレームワークは、ストーリービジュアライゼーションにおけるビジュアルロジックを明示的にモデル化する中心的なイノベーションに基づいて構築されている。このアイデアを実現するために、私たちは、役割を基盤として因果連鎖を抽出し、ストーリーレベルの一貫性を検証し、物語の一貫性を画像生成の暗黙的な副産物から明示的なモデリング対象へと変換するマルチエージェントシステムを設計する。このデザインは、構造化されたストーリープランニングとビジュアルジェネレーションを効果的に橋渡しし、ストーリービジュアライゼーションにおける物語の明瞭さと視覚的品質を両立させる。さらに、生成能力を評価するために、リッチな注釈付きストーリー、因果推論、視覚論理の解釈性を重視したベンチマークであるLogicTaleを構築した。視覚ロジックと知覚品質の両方を測定するために設計された総合的自動評価プロトコルを確立する。実験により,本手法は生成したビジュアルストーリーの物語論理を大幅に改善することが示された。この研究は、一般的な画像シーケンスおよびビデオ生成タスクにおける視覚ロジックのモデリングと強化に向けた基礎的なステップを提供する。

論文の概要: LogiStory: A Logic-Aware Framework for Multi-Image Story Visualization

関連論文リスト