Fugu-MT 論文翻訳(概要): DynT2I-Eval: A Dynamic Evaluation Framework for Text-to-Image Models

論文の概要: DynT2I-Eval: A Dynamic Evaluation Framework for Text-to-Image Models

arxiv url: http://arxiv.org/abs/2605.06170v1
Date: Thu, 07 May 2026 12:53:51 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-08 22:27:11.796523
Title: DynT2I-Eval: A Dynamic Evaluation Framework for Text-to-Image Models
Title（参考訳）: DynT2I-Eval:テキスト・画像モデルのための動的評価フレームワーク
Authors: Juntong Wang, Jiarui Wang, Huiyu Duan, Lewei Li, Guangtao Zhai, Xiongkuo Min,
Abstract要約: テキスト・トゥ・イメージ(T2I)モデルのための完全に自動化された動的評価フレームワークであるDynT2I-Evalを提案する。長い形式の記述から構造化された視覚意味空間を構築し、プロンプトを制御可能な次元に分解する。 DynT2I-Evalは、テキストアライメント、知覚品質、美学のモデル性能を評価する。
参考スコア（独自算出の注目度）: 78.62380562116135
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Existing text-to-image (T2I) benchmarks largely rely on fixed prompt sets, leaving them vulnerable to overfitting and benchmark contamination once publicly released and repeatedly reused. In this work, we propose DynT2I-Eval, a fully automated dynamic evaluation framework for T2I models. It constructs a structured visual semantic space from long-form descriptions, decomposing prompts into controllable dimensions (e.g., subject, logical constraint, environment, and composition). This enables the continuous generation of fresh prompts via task-specific spaces and difficulty-aware sampling. DynT2I-Eval evaluates model performance across text alignment, perceptual quality, and aesthetics. Heterogeneous outputs are unified into prompt-conditioned pairwise comparisons, allowing a dynamic scheduler, micro-batch aggregation, and weighted Bayesian updates to maintain a stable online leaderboard despite changing prompt distributions and model injection. Experiments with independently sampled prompt streams demonstrate that continually refreshed prompts provide a robust evaluation protocol, reducing the impact of prompt-set-specific tuning. Simulations and ablations further confirm that the proposed ranking framework achieves a strong balance among cold-start convergence, late-entry discovery, and long-run ranking fidelity.
Abstract（参考訳）: 既存のテキスト・トゥ・イメージ(T2I)ベンチマークは、主に固定プロンプトセットに依存しており、オーバーフィッティングやベンチマーク汚染に対して脆弱で、一度公開され、繰り返し再利用されたままである。本研究では,T2Iモデルのための動的評価フレームワークDynT2I-Evalを提案する。長い形式の記述から構造化された視覚意味空間を構築し、プロンプトを制御可能な次元(例えば、主題、論理的制約、環境、構成)に分解する。これにより、タスク固有の空間と困難なサンプリングを通じて、新しいプロンプトを連続的に生成することが可能になる。 DynT2I-Evalは、テキストアライメント、知覚品質、美学のモデル性能を評価する。不均一な出力は、即時条件付きペアワイズ比較に統一され、動的スケジューラ、マイクロバッチアグリゲーション、および重み付けされたベイズ更新により、迅速な分布やモデルインジェクションの変更にもかかわらず安定したオンラインリーダーボードを維持することができる。独立にサンプリングされたプロンプトストリームの実験では、継続的にリフレッシュされたプロンプトがロバストな評価プロトコルを提供し、プロンプトセット固有のチューニングの影響を減らすことが示されている。シミュレーションとアブレーションにより、提案したランキングフレームワークは、コールドスタートコンバージェンス、レイトエントロエント発見、ロングランランキングフィデリティの強いバランスを達成できることを確認した。

論文の概要: DynT2I-Eval: A Dynamic Evaluation Framework for Text-to-Image Models

関連論文リスト