Fugu-MT 論文翻訳(概要): MIRAGE: A Micro-Interaction Relational Architecture for Grounded Exploration in Multi-Figure Artworks

論文の概要: MIRAGE: A Micro-Interaction Relational Architecture for Grounded Exploration in Multi-Figure Artworks

arxiv url: http://arxiv.org/abs/2604.23788v1
Date: Sun, 26 Apr 2026 16:25:30 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-28 17:12:07.549198
Title: MIRAGE: A Micro-Interaction Relational Architecture for Grounded Exploration in Multi-Figure Artworks
Title（参考訳）: MIRAGE: マルチフィールドアートワークにおける地中探査のためのマイクロインタラクション関係アーキテクチャ
Authors: Jui-Cheng Chiu, Yu-Chao Wang, Shengyang Luo, Tongyan Wang, Qi Yang, Nabin Khanal, Yingjie Victor Chen,
Abstract要約: MIRAGEは,多機能アートワークにおける「マイクロインタラクション」の探索を足場として設計されたエビデンス中心のフレームワークである。その結果、MIRAGEはアイデンティティの整合性を著しく改善し、関係幻覚を低減し、微妙な相互作用のカバレッジを増大させることがわかった。
参考スコア（独自算出の注目度）: 9.397297838455238
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Appreciating multi-figure paintings requires understanding how characters relate through subtle cues like gaze alignment, gesture, and spatial arrangement. We present MIRAGE, an evidence-centric framework designed to scaffold the exploration of these "micro-interactions" in multi-figure artworks. While such cues are essential for deep narrative appreciation, they are often distributed across complex scenes and difficult for viewers to systematically identify. Existing vision-language models (VLMs) frequently fail to provide reliable assistance, offering ungrounded interpretations that lack traceable visual evidence. MIRAGE addresses this by constructing a structured intermediate representation capturing identities, pose cues, and gaze hypotheses. However, the challenge extends beyond extracting these cues to coordinating them during interpretation. Without an explicit mechanism to organize and reconcile relational evidence, models often collapse multiple interaction hypotheses into a single unstable or weakly grounded narrative, even when low-level signals are available. This representation allows users to verify how high-level interpretations are anchored in low-level visual facts. By separating spatial grounding from narrative generation, MIRAGE enables users to inspect and reason about figure-to-figure relationships through a verifiable evidence layer. We evaluate MIRAGE against painting-only VLM baselines using a blind assessment protocol. Results show that MIRAGE significantly improves identity consistency, reduces relational hallucinations, and increases the coverage of subtle interactions. These findings suggest that structured grounding can serve as a critical interaction control layer, providing the necessary scaffolding for a more reliable, transparent, and human-led understanding of complex visual narratives.
Abstract（参考訳）: マルチフィギュア絵画の鑑賞には、視線アライメントやジェスチャー、空間的アレンジメントといった微妙な方法を通じて、文字がどのように関係しているかを理解する必要がある。 MIRAGE(エビデンス中心のフレームワーク)は、多機能アートワークにおけるこれらの「マイクロインタラクション」の探索を足場として設計されている。このような手口は深い物語の鑑賞には不可欠であるが、複雑な場面に分散し、視聴者が体系的に識別することが困難であることが多い。既存の視覚言語モデル(VLM)は、しばしば信頼できる補助を提供しず、追跡可能な視覚的証拠を欠く未解決の解釈を提供する。 MIRAGEは、アイデンティティをキャプチャし、キューをポーズし、仮説を見つめる構造化された中間表現を構築することで、この問題に対処する。しかし、この課題は、解釈中にこれらを調整するためにこれらの手がかりを抽出することを超えて拡張される。リレーショナルエビデンスを組織化し、調整するための明確なメカニズムがなければ、低レベル信号が利用可能であっても、モデルは複数の相互作用仮説を単一の不安定または弱い基底の物語に分解することが多い。この表現により、ユーザーは低レベルの視覚的事実に高レベルの解釈がどのように固定されているかを検証することができる。物語生成から空間的接地を分離することにより、MIRAGEは、検証されたエビデンス層を通じて、図形-図形関係の検査と推論を可能にする。ブラインドアセスメントプロトコルを用いて,絵画のみのVLMベースラインに対するMIRAGEの評価を行った。その結果、MIRAGEはアイデンティティの整合性を著しく改善し、関係幻覚を低減し、微妙な相互作用のカバレッジを増大させることがわかった。これらの結果は、構造的接地が重要な相互作用制御層として機能し、より信頼性が高く透明で人間主導の複雑な視覚的物語理解に必要な足場を提供することを示唆している。

論文の概要: MIRAGE: A Micro-Interaction Relational Architecture for Grounded Exploration in Multi-Figure Artworks

関連論文リスト