Fugu-MT 論文翻訳(概要): FreeStory: Training-Free Character Consistency for Free-Form Visual Storytelling

論文の概要: FreeStory: Training-Free Character Consistency for Free-Form Visual Storytelling

arxiv url: http://arxiv.org/abs/2606.25079v1
Date: Tue, 23 Jun 2026 18:37:31 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-25 17:05:30.118663
Title: FreeStory: Training-Free Character Consistency for Free-Form Visual Storytelling
Title（参考訳）: FreeStory: フリーフォームなビジュアルストーリーテリングのためのトレーニング不要な文字一貫性
Authors: Sibo Dong, Ismail Shaheen, Sarah Adel Bargal,
Abstract要約: FreeStoryは、自由形式のプロンプトの下で文字の一貫性を実体的な機能再利用として再構築する、トレーニング不要のフレームワークである。提案手法は,参照参照と対応する文字記述を関連付け,動的文字マスク,対応性を考慮した特徴マッチング,キー値注入,クエリブレンディングを組み合わせる。実験により、FreeStoryは構造化ベンチマーク上でのトレーニングフリーメソッドの最先端性能と、フリーフォームプロンプト下でのベースラインに対する全体的な一貫性の向上を実現している。
参考スコア（独自算出の注目度）: 4.671002796177002
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Visual storytelling aims to generate image sequences that are both aligned with narrative prompts and consistent in character appearance across images. Recent training-free methods improve character consistency by reusing attention features, but rely on structured prompts where full character descriptions are repeated in every prompt. This assumption simplifies the task but deviates from natural storytelling, where characters are typically introduced once and later referred to using pronouns or type-based expressions. We propose \textbf{FreeStory}, a training-free framework that reformulates character consistency under free-form prompts as entity-grounded feature reuse. Our method associates reference mentions with their corresponding character descriptions and combines dynamic character masks, correspondence-aware feature matching, key-value injection, and query blending to preserve identity while retaining generation diversity. We also introduce \textbf{FreeStoryBench}, a benchmark for this setting that includes both single- and multi-character stories. Experiments show that FreeStory achieves state-of-the-art performance among training-free methods on structured benchmarks and stronger overall consistency over baselines under free-form prompts.
Abstract（参考訳）: ビジュアルストーリーテリングは、物語のプロンプトに一致し、画像間のキャラクタの外観に一貫性のある画像シーケンスを生成することを目的としている。近年のトレーニングフリーな手法は注意機能を再使用することで文字の一貫性を向上させるが、すべてのプロンプトでフル文字記述が繰り返される構造化プロンプトに依存している。この仮定はタスクを単純化するが、自然のストーリーテリングから逸脱する。我々は,自由形式のプロンプトの下で文字の一貫性を実体的特徴再利用として再構成する,トレーニング不要のフレームワークである‘textbf{FreeStory} を提案する。提案手法は,参照参照記述と対応する文字記述を関連付け,動的文字マスク,対応対応型特徴マッチング,キー値注入,クエリブレンディングを組み合わせることで,生成多様性を維持しつつアイデンティティを保持する。また、この設定のためのベンチマークである \textbf{FreeStoryBench} も導入しています。実験により、FreeStoryは構造化ベンチマーク上でのトレーニングフリーメソッドの最先端性能と、フリーフォームプロンプト下でのベースラインに対する全体的な一貫性の向上を実現している。

論文の概要: FreeStory: Training-Free Character Consistency for Free-Form Visual Storytelling

関連論文リスト