Fugu-MT 論文翻訳(概要): Self-Reasoning Agentic Framework for Narrative Product Grid-Collage Generation

論文の概要: Self-Reasoning Agentic Framework for Narrative Product Grid-Collage Generation

arxiv url: http://arxiv.org/abs/2604.16958v1
Date: Sat, 18 Apr 2026 10:40:31 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-21 21:52:52.261147
Title: Self-Reasoning Agentic Framework for Narrative Product Grid-Collage Generation
Title（参考訳）: 物語的製品グリッドコラージュ生成のための自己推論エージェントフレームワーク
Authors: Minyan Luo, Yuxin Zhang, Yifei Li, Xincan Wang, Fuzhang Wu, Tong-Yee Lee, Oliver Deussen, Weiming Dong,
Abstract要約: 商品グリッドコラージュ生成のための自己推論型エージェントフレームワークを提案する。私たちのフレームワークは、審美的品質、物語の豊かさ、視覚的コヒーレンスを継続的に改善します。
参考スコア（独自算出の注目度）: 36.34342923312848
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Narrative-driven product photography has become a prevalent paradigm in modern marketing, as coherent visual storytelling helps convey product value and establishes emotional engagement with consumers. However, existing image generation methods do not support structured narrative planning or cross-panel coordination, often resulting in weak storytelling and visual incoherence. In practice, narrative product photography is commonly presented as multi-grid collages, where multiple views or scenes jointly communicate a product narrative. To ensure visual consistency across grids and aesthetic harmony of the overall composition, we generate the collage as a single unified image rather than composing independently synthesized panels. We propose a self-reasoning agentic framework for narrative product grid collage generation. Given a product packshot and its name, the system first constructs a Product Narrative Framework that explicitly represents the product's identity, usage context, and situational environment, and translates it into complementary grids governed by a shared visual style. Constraint-aware prompts are then compiled and fed to a generation model that synthesizes the collage jointly. The generated output is evaluated on both content validity and photography quality, with explicit gates determining whether to proceed or refine. When evaluation fails, the system performs failure attribution and applies targeted refinement, enabling progressive improvement through iterative self-reflection. Experiments demonstrate that our framework consistently improves aesthetic quality, narrative richness, and visual coherence, compared to direct prompting baselines.
Abstract（参考訳）: 物語駆動のプロダクト写真は、コヒーレントなビジュアルストーリーテリングが製品価値を伝達し、消費者との感情的な関わりを確立するため、現代のマーケティングにおいて一般的なパラダイムとなっている。しかし、既存の画像生成手法は、構造化されたストーリープランニングやクロスパネルコーディネートをサポートしておらず、しばしばストーリーテリングの弱さと視覚的不整合をもたらす。実際には、物語製品写真は、複数のビューやシーンが共同で製品物語を伝えるマルチグリッドコラージュとして一般的に紹介される。グリッド間の視覚的一貫性と全体構成の審美的調和を確保するため,独立に合成されたパネルを構成するのではなく,単一の統一画像としてコラージュを生成する。商品グリッドコラージュ生成のための自己推論型エージェントフレームワークを提案する。製品パックショットとその名称を与えられたシステムは、まず製品のアイデンティティ、使用状況、状況環境を明確に表現したプロダクトナラティブフレームワークを構築し、それを視覚的な共有スタイルによって管理される補完的なグリッドに変換する。制約対応プロンプトはコンパイルされ、コラージュを共同で合成する生成モデルに供給される。生成した出力は、コンテンツ妥当性と写真品質の両方に基づいて評価され、明確なゲートが進行するか否かを判定する。評価が失敗すると、システムは失敗の帰属を行い、目標の洗練を施し、反復的な自己回帰による進歩的な改善を可能にする。実験により,我々のフレームワークは,直感的ベースラインよりも審美的品質,物語的豊かさ,視覚的コヒーレンスを一貫して改善することが示された。

論文の概要: Self-Reasoning Agentic Framework for Narrative Product Grid-Collage Generation

関連論文リスト