Fugu-MT 論文翻訳(概要): MetaLogic: Robustness Evaluation of Text-to-Image Models via Logically Equivalent Prompts

論文の概要: MetaLogic: Robustness Evaluation of Text-to-Image Models via Logically Equivalent Prompts

arxiv url: http://arxiv.org/abs/2510.00796v1
Date: Wed, 01 Oct 2025 11:51:13 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-03 16:59:20.537115
Title: MetaLogic: Robustness Evaluation of Text-to-Image Models via Logically Equivalent Prompts
Title（参考訳）: メタ論理:論理等価プロンプトによるテキスト・画像モデルのロバスト性評価
Authors: Yifan Shen, Yangyang Shu, Hye-young Paik, Yulei Sui,
Abstract要約: テキスト・トゥ・イメージ(T2I)モデルは、入力が言語的変化を誘発するときに意味的一貫性を維持するのに苦労する。提案するMetaLogicは,T2Iミスアライメントを検出する新しい評価フレームワークである。
参考スコア（独自算出の注目度）: 13.010772460971374
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in text-to-image (T2I) models, especially diffusion-based architectures, have significantly improved the visual quality of generated images. However, these models continue to struggle with a critical limitation: maintaining semantic consistency when input prompts undergo minor linguistic variations. Despite being logically equivalent, such prompt pairs often yield misaligned or semantically inconsistent images, exposing a lack of robustness in reasoning and generalisation. To address this, we propose MetaLogic, a novel evaluation framework that detects T2I misalignment without relying on ground truth images. MetaLogic leverages metamorphic testing, generating image pairs from prompts that differ grammatically but are semantically identical. By directly comparing these image pairs, the framework identifies inconsistencies that signal failures in preserving the intended meaning, effectively diagnosing robustness issues in the model's logic understanding. Unlike existing evaluation methods that compare a generated image to a single prompt, MetaLogic evaluates semantic equivalence between paired images, offering a scalable, ground-truth-free approach to identifying alignment failures. It categorises these alignment errors (e.g., entity omission, duplication, positional misalignment) and surfaces counterexamples that can be used for model debugging and refinement. We evaluate MetaLogic across multiple state-of-the-art T2I models and reveal consistent robustness failures across a range of logical constructs. We find that even the SOTA text-to-image models like Flux.dev and DALLE-3 demonstrate a 59 percent and 71 percent misalignment rate, respectively. Our results show that MetaLogic is not only efficient and scalable, but also effective in uncovering fine-grained logical inconsistencies that are overlooked by existing evaluation metrics.
Abstract（参考訳）: テキスト・ツー・イメージ(T2I)モデルの最近の進歩、特に拡散型アーキテクチャは、生成した画像の視覚的品質を大幅に向上させた。しかしながら、これらのモデルは、入力プロンプトが小さな言語的変化を受けるときの意味的一貫性を維持するという、重要な制限に悩まされ続けている。論理的に等価であるにもかかわらず、そのようなプロンプトペアはしばしば不整合あるいは意味的に矛盾したイメージをもたらし、推論や一般化における堅牢性の欠如を露呈する。そこで本研究では,T2Iの誤認識を検出する新しい評価フレームワークであるMetaLogicを提案する。 MetaLogicはメタモルフィックテストを活用し、文法的に異なるが意味的に同一であるプロンプトから画像ペアを生成する。これらの画像ペアを直接比較することにより、このフレームワークは意図した意味を保存する際の失敗を信号する不整合を識別し、モデルの論理的理解における堅牢性の問題を効果的に診断する。生成した画像を単一のプロンプトと比較する既存の評価方法とは異なり、MetaLogicはペア画像間の意味的等価性を評価し、アライメント障害を特定するためのスケーラブルで地道なアプローチを提供する。これらのアライメントエラー(例えば、エンティティの省略、重複、位置のずれ)と、モデルデバッグと改善に使用できる表面の反例を分類する。複数の最先端T2Iモデルにまたがるメタロジックを評価し,論理構造にまたがる一貫したロバスト性障害を明らかにする。 Flux.dev や DALLE-3 のような SOTA のテキスト・ツー・イメージモデルでさえ,それぞれ 59% と 71% の誤認識率を示していることがわかった。以上の結果から,MetaLogicは効率的かつスケーラブルであるだけでなく,既存の評価指標から見過ごされる詳細な論理的不整合を明らかにする上でも有効であることが示唆された。

論文の概要: MetaLogic: Robustness Evaluation of Text-to-Image Models via Logically Equivalent Prompts

関連論文リスト